5,649 Matching Annotations
  1. Nov 2024
    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant issues regarding the experimental design and potential misinterpretations of key findings. Consequently, the manuscript contributes little to our understanding of SynGap1 loss mechanisms.

      Major issues in the second version of the manuscript:

      In the review of the first version there were major issues and contradictions with the sEPSC and mEPSC data, and were not resolved after the revision, and the new control experiments rather confirmed the contradiction.

      In the original review I stated: "One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity.‎ The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar."<br /> Contradictions remained after the revision of the manuscript. On one hand, the authors claimed in the revised version that "We found no difference in mEPSC amplitude between the two genotypes (Fig. 1g), indicating that the observed difference in sEPSC amplitude (Figure 1b) could arise from decreased network excitability". On the other hand, later they show "no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be AP independent." The latter means that sEPSCs and mEPSCs are the same type of events, which should have the same sensitivity to manipulations.

      We understand that the data are confusing. Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See Fig.1c-f, at the end of this document), but their individual responses are diluted when all cells are pooled together. To account for this variability, we are currently recording sEPSC followed by mEPSC from more mice of both genotypes. We will rephrase the text to reflect the updated data accordingly, keeping with the editors and reviewers’ suggestions.

      Concerns about the quality of the synapse counting experiments were addressed by showing additional images in a different and explaining quantification. However, the admitted restriction of the analysis of excitatory synapses to the somatic region represent a limitation, as they include only a small fraction of the total excitation - even if, the slightly larger amplitudes of their EPSPs are considered.

      We agree with the reviewer that restricting the anatomical analysis of excitatory synapses to PV cell somatic region is a limitation, which is what we have already highlighted in the discussion of the revised manuscript. Recent studies, based on serial block-face scanning electron microscopy, suggest that cortical PV+ interneurons receive more robust excitatory inputs to their perisomatic region as compared to pyramidal neurons (see for example, Hwang et al. 2021, Cerebral Cortex, http://doi.org/10.1093/cercor/bhaa378). It is thus possible that putative glutamatergic synapses, analysed by vGlut1/PSD95 colocalisation around PV+ cell somata, may be representative of a substantially major excitatory input population. Similar immunolabeling and quantification approach coupled with mEPSC analysis have been reported in several publications by other labs (for example Bernard et al 2022, Science 378, doi: 10.1126/science.abm7466; Exposito-Alonso et al, 2020 eLife, doi: 10.7554/eLife.57000). Since analysing putative excitatory synapses onto PV+ dendrites would be difficult and require a much longer time, we will re-phrase the text to more clearly highlight the rationale and limitation of this approach.

      New experiments using paired-pulse stimulation provided an answer to issues 3 and 4. Note that the numbering of the Figures in the responses and manuscript are not consistent.

      We are glad that the reviewer found that the new paired-pulse experiments answered previously raised concerns. We will correct the discrepancy in figure numbers in the manuscript.

      I agree that low sampling rate of the APs does not change the observed large differences in AP threshold, however, the phase plots are still inconsistent in a sense that there appears to be an offset, as all values are shifted to more depolarized membrane potentials, including threshold, AP peak, AHP peak. This consistent shift may be due to a non-biological differences in the two sets of recordings, and, importantly, it may negate the interpretation of the I/f curves results (Fig. 5e).

      We agree with the reviewers that higher sampling rate would allow to more accurately assess different parameters, such as AP height, half-width, rise time, etc., while it would not affect the large differences in AP threshold we observed between control and mutant mice. Since the phase plots to not add to our result analysis, we will remove them. The offset shown in Fig.5 was due to the unfortunate choice of two random neurons; this offset is not present in the different examples shown in Fig.7. We apologize for the confusion.

      Additional issues:

      The first paragraph of the Results mentioned that the recorded cells were identified by immunolabelling and axonal localization. However, neither the Results nor the Methods mention the criteria and levels of measurements of axonal arborization.

      As suggested, we will add this information in the revised manuscript.

      The other issues of the first review were adequately addressed by the Authors and the manuscript improved by these changes.

      Reviewer #3 (Public review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences between control and mutants in both interneuron populations, although they claim a predominance in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunctions observed in Syngap1 haploinsufficiency-related intellectual disability.

      The subject of the work is interesting, and most of the approach is rather direct and straightforward, which are strengths. There are also some methodological weaknesses and interpretative issues that reduce the impact of the paper.

      (1) Supplementary Figure 3: recording and data analysis. The data of Supplementary Figure 3 show no differences either in the frequency or amplitude of synaptic events recorded from the same cell in control (sEPSCs) vs TTX (mEPSCs). This suggests that, under the experimental conditions of the paper, sEPSCs are AP-independent quantal events. However, I am concerned by the high variability of the individual results included in the Figure. Indeed, several datapoints show dramatically different frequencies in control vs TTX, which may be explained by unstable recording conditions. It would be important to present these data as time course plots, so that stability can be evaluated. Also, the claim of lack of effect of TTX should be corroborated by positive control experiments verifying that TTX is working (block of action potentials, for example). Lastly, it is not clear whether the application of TTX was consistent in time and duration in all the experiments and the paper does not clarify what time window was used for quantification.

      We understand the reviewer’s concern about high variability. To account for this variability, we are currently recording sEPSC followed by mEPSC from more mice of both genotypes.

      Indeed, we confirmed that TTX was working several times through the time course of this study, in different aliquots prepared from the same TTX vial used for all experiments. The results of the last test we performed, showing that TTX application blocks action potentials (2 recordings, one from a SST+ and one from a PV+ interneuron), are shown in Fig.1a,b at the end of this document. TTX was applied using the same protocol for all recorded neurons. In particular, sEPSCs were first sampled over a 2 min period. TTX (1μM; Alomone Labs) was then perfused into the recording chamber at a flow rate of 2 mL/min. We then waited for 5 min before sampling mEPSCs over a 2 min period. We will add this information in the revised manuscript methods. Finally, Fig.1g-j shows series resistance (Rs) over time for 4 different PV+ interneurons, indicating recording stability. These results are representative of the entire population of recorded neurons, which we have meticulously analysed one by one.

      (2) Figure 1 and Supplementary Figure 3: apparent inconsistency. If, as the authors claim, TTX does not affect sEPSCs (either in the control or mutant genotype, Supplementary Figure 3 and point 1 above), then comparing sEPSC and mEPSC in control vs mutants should yield identical results. In contrast, Figure 1 reports a _selective_ reduction of sEPSCs amplitude (not in mEPSCs) in mutants, which is difficult to understand. The proposed explanation relying on different pools of synaptic vesicles mediating sEPSCs and mEPSCs does not clarify things. If this was the case, wouldn't it also imply a decrease of event frequency following TTX addition? However, this is not observed in Supplementary Figure 3. My understanding is that, according to this explanation, recordings in control solution would reflect the impact of two separate pools of vesicles, whereas, in the presence of TTX, only one pool would be available for release. Therefore, TTX should cause a decrease in the frequency of the recorded events, which is not what is observed in Supplementary Figure 3.

      Our results suggest a diverse population of PV+ cells, with varying reliance on action potential-dependent and -independent release. Several PV+ cells indeed show TTX sensitivity (reduced EPSC event amplitudes following TTX application: See Fig.1c-f, at the end of this document), but their individual responses are diluted when all cells are pooled together. As mentioned above, we are currently recording sEPSCs followed by mEPSCs from more mice of both genotypes, to account for the large variability. We will rephrase the text in the revised manuscript according to the updated data and reviewers’ suggestions.

      (3) Figure 1: statistical analysis. Although I do appreciate the efforts of the authors to illustrate both cumulative distributions and plunger plots with individual data, I am confused by how the cumulative distributions of Figure 1b (sEPSC amplitude) may support statistically significant differences between genotypes, but this is not the case for the cumulative distributions of Figure 1g (inter mEPSC interval), where the curves appear even more separated. A difference in mEPSC frequency would also be consistent with the data of Supplementary Fig 2b, which otherwise are difficult to reconciliate. I would encourage the authors to use the Kolmogorov-Smirnov rather than a t-test for the comparison of cumulative distributions.

      We thank the reviewer for this suggestion. We used both cumulative distribution and plunger plots with individual data because they convey 2 different kinds of information. Cumulative distributions highlight where the differences lie (the deltas between the groups), while plunger plots with individual data show the variability between data points. In histogram 1g, the variability is greater than in 1b (due to the smaller sample size in 1g), which leads to larger error bars and directly impacts the statistical outcome. So, while the delta is larger in 1g, the variability is also greater. In contrast, the delta in 1b is smaller, as is the variability, which in turn affects the statistical outcome. To address this issue, we are currently increasing N of recordings.

      We will include Kolmogorov-Smirnov analysis in the revision, as suggested; nevertheless, we will base our conclusions on statistical results generated by the linear mixed model (LMM), modelling animal as a random effect and genotype as the fixed effect. We used this statistical analysis since we considered the number of mice as independent replicates and the number of cells in each mouse as repeated/correlated measures. The reason we decided to use LMM for our statistical analyses is based on the growing concern over reproducibility in biomedical research and the ongoing discussion on how data are analysed (see for example, Yu et al (2022), Neuron 110:21-35 https://doi: 10.1016/j.neuron.2021.10.030; Aarts et al. (2014). Nat Neurosci 17, 491–496. https://doi.org/10.1038/nn.3648). We acknowledge that patch-clamp data has been historically analysed using t-test and analysis of variance (ANOVA), or equivalent non-parametric tests. However, these tests assume that individual observations (recorded neurons in this case) are independent of each other. Whether neurons from the same mouse are independent or correlated variables is an unresolved question, but does not appear to be likely from a biological point of view. Statisticians have developed effective methods to analyze correlated data, including LMM. In parallel, we also tested the data by using the standard parametric and non-parametric analyses and reported these results as well (Tables 1-9, and S1-S2).

      (4) Methods. I still maintain that a threshold at around -20/-15 mV for the first action potential of a train seems too depolarized (see some datapoints of Fig 5c and Fig7c) for a healthy spike. This suggest that some cells were either in precarious conditions or that the capacitance of the electrode was not compensated properly.

      As suggested by the reviewer, we will exclude the neurons with threshold at -20/-15 mV. In addition, we performed statistical analysis with and without these cells (data reported below) and found that whether these cells are included or excluded, the statistical significance of the results does not change.

      Fig.5c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: -42.6±1.01 mV in control, n=33 cells from 15 mice vs -35.3±1.2 mV in cHet, n=40 cells from 17 mice, ***p<0.001, LMM; excluding the 2 outliers from cHet group -42.6±1.01 mV in control, n=33 cells from 15 mice vs -36.2±1.1 mV in cHet, n=38 cells from 17 mice, ***p<0.001, LMM.

      Fig.7c: including the 2 outliers from cHet group with values of -16.5 and 20.6 mV: -43.4±1.6 mV in control, n=12 cells from 9 mice vs -33.9±1.8 mV in cHet, n=24 cells from 13 mice, **p=0.002, LMM; excluding the 2 outliers from cHet group -43.4±1.6 mV in control, n=12 cells from 9 mice vs -35.4±1.7 mV in cHet, n=22 cells from 13 mice, *p=0.037, LMM.

      (5) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties (Figure 8d,e); however, their evoked firing properties were affected with fewer AP generated in response to the same depolarizing current injection".<br /> This sentence is intrinsically contradictory. Action potentials triggered by current injections are dependent on the integration of passive and active properties. If the curves of Figure 8f are different between genotypes, then some passive and/or active property MUST have changed. It is an unescapable conclusion. The general _blanket_ statement of the authors that there are no significant changes in active and passive properties is in direct contradiction with the current/#AP plot.

      We shall rephrase the text according to the reviewer’s suggestion to better represent the data. As discussed in the first revision, it's possible that other intrinsic factors, not assessed in this study, may have contributed to the effect shown in the current/#AP plot.

      (6) The phase plots of Figs 5c, 7c, and 7h suggest that the frequency of acquisition/filtering of current-clamp signals was not appropriate for fast waveforms such as spikes. The first two papers indicated by the authors in their rebuttal (Golomb et al., 2007; Stevens et al., 2021) did not perform a phase plot analysis (like those included in the manuscript). The last work quoted in the rebuttal (Zhang et al., 2023) did perform phase plot analysis, but data were digitized at a frequency of 20KHz (not 10KHz as incorrectly indicated by the authors) and filtered at 10 kHz (not 2-3 kHz as by the authors in the manuscript). To me, this remains a concern.

      We agree with the reviewer that higher sampling rate would allow to more accurately assess different AP parameters, such as AP height, half-width, rise time, etc. The papers were cited in context of determining AP threshold, not performing phase plot analysis. We apologize for the confusion and error. Further, as mentioned above, we will remove the phase plots since they do not add relevant information.

      (7) The general logical flow of the manuscript could be improved. For example, Fig 4 seems to indicate no morphological differences in the dendritic trees of control vs mutant PV cells, but this conclusion is then rejected by Fig 6. Maybe Fig 4 is not necessary. Regarding Fig 6, did the authors check the integrity of the entire dendritic structure of the cells analyzed (i.e. no dendrites were cut in the slice)? This is critical as the dendritic geometry may affect the firing properties of neurons (Mainen and Sejnowski, Nature, 1996).

      As suggested by the reviewer, we will remove Fig.4. All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites.

      Author response image 1.

      (a, b) Representative voltage responses of a SST+ cell (a) and a PV+ cell (b) in absence (left) and presence (right) of TTX in response to depolarizing current injections corresponding to threshold current and 2x threshold current. (c-f) Cumulative histograms of sEPSCs/mEPSCs amplitude (bin width 0.5 pA) and frequency (bin width 10 ms) recorded from four PV+ cells.  sEPSC were recorded for 2 minutes, then TTX (1μM; Alomone Labs) was perfused into the recording chamber. After 5 minutes, mEPSC were recorded for 2 minutes. (g, h, i, j) Time course plots of series resistance (Rs) of the four representative PV+ cells shown in c-f before (sEPSC) and during the application of TTX (mEPSC).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is designed to assess the role of Syngap1 in regulating the physiology of the MGE-derived PV+ and SST+ interneurons. Syngap1 is associated with some mental health disorders, and PV+ and SST+ cells are the focus of many previous and likely future reports from studies of interneuron biology, highlighting the translational and basic neuroscience relevance of the authors' work.

      Strengths of the study are using well-established electrophysiology methods and the highly controlled conditions of ex vivo brain slice experiments combined with a novel intersectional mouse line, to assess the role of Syngap1 in regulating PV+ and SST+ cell properties. The findings revealed that in the mature auditory cortex, Syngap1 haploinsufficiency decreases both the intrinsic excitability and the excitatory synaptic drive onto PV+ neurons from Layer 4. In contrast, SST+ interneurons were mostly unaffected by Syngap1 haploinsufficiency. Pharmacologically manipulating the activity of voltagegated potassium channels of the Kv1 family suggested that these channels contributed to the decreased PV+ neuron excitability by Syngap insufficiency. These results therefore suggest that normal Syngap1 expression levels are necessary to produce normal PV+ cell intrinsic properties and excitatory synaptic drive, albeit, perhaps surprisingly, inhibitory synaptic        transmission was not affected by Syngap1 haploinsufficiency.

      Since the electrophysiology experiments were performed in the adult auditory cortex, while Syngap1 expression was potentially affected since embryonic stages in the MGE, future studies should address two important points that were not tackled in the present study. First, what is the developmental time window in which Syngap1 insufficiency disrupted PV+ neuron properties? Albeit the embryonic Syngap1 deletion most likely affected PV+ neuron maturation, the properties of Syngap-insufficient PV+ neurons do not resemble those of immature PV+ neurons. Second, whereas the observation that Syngap1 haploinsufficiency affected PV+ neurons in auditory cortex layer 4 suggests auditory processing alterations, MGE-derived PV+ neurons populate every cortical area. Therefore, without information on whether Syngap1 expression levels are cortical area-specific, the data in this study would predict that by regulating PV+ neuron electrophysiology, Syngap1 normally controls circuit function in a wide range of cortical areas, and therefore a range of sensory, motor and cognitive functions. These are relatively minor weaknesses regarding interpretation of the data in the present study that the authors could discuss.

      We agree with the reviewer on the proposed open questions, which we now discuss in the revised manuscript. We do have experimental evidence suggesting that Syngap1 mRNA is expressed by PV+ and SST+ neurons in different cortical areas, during early postnatal development and in adulthood (Jadhav et al., 2024); therefore, we agree that it will be important, in future experiments, to tackle the question of when the observed phenotypes arise.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors investigated how partial loss of SynGap1 affects inhibitory neurons derived from the MGE in the auditory cortex, focusing on their synaptic inputs and excitability. While haplo-insufficiently of SynGap1 is known to lead to intellectual disabilities, the underlying mechanisms remain unclear.

      Strengths:

      The questions are novel

      Weaknesses:

      Despite the interesting and novel questions, there are significant concerns regarding the experimental design and data quality, as well as potential misinterpretations of key findings. Consequently, the current manuscript fails to contribute substantially to our understanding of SynGap1 loss mechanisms and may even provoke unnecessary controversies.

      Major issues:

      (1) One major concern is the inconsistency and confusion in the intermediate conclusions drawn from the results. For instance, while the sEPSC data indicates decreased amplitude in PV+ and SOM+ cells in cHet animals, the frequency of events remains unchanged. In contrast, the mEPSC data shows no change in amplitudes in PV+ cells, but a significant decrease in event frequency. The authors conclude that the former observation implies decreased excitability. However, traditionally, such observations on mEPSC parameters are considered indicative of presynaptic mechanisms rather than changes of network activity. The subsequent synapse counting experiments align more closely with the traditional conclusions. This issue can be resolved by rephrasing the text. However, it would remain unexplained why the sEPSC frequency shows no significant difference. If the majority of sEPSC events were indeed mediated by spiking (which is blocked by TTX), the average amplitudes and frequency of mEPSCs should be substantially lower than those of sEPSCs. Yet, they fall within a very similar range, suggesting that most sEPSCs may actually be independent of action potentials. But if that was indeed the case, the changes of purported sEPSC and mEPSC results should have been similar.

      We understand the reviewer’s perspective; indeed, we asked ourselves the very same question regarding why the sEPSC and mEPSC frequency fall within a similar range when we analysed neuron means (bar graphs). We thus recorded sEPSCs followed by mEPSCs from several PV neurons (control and cHet) and included this data to the revised version of the manuscript (new Supplementary Figure 3). We found that the average amplitudes and frequency of mEPSCs together with their respective cumulative probability curves were not significantly different than those of sEPSCs. We rephrased the manuscript to present potential interpretations of the data.

      We hope that we have correctly interpreted the reviewer's concern. If the question is why we do not observe a significant difference in the average frequency when comparing sEPSC and mEPSC in control mice, this could be explained by the fact that increased mean amplitude of sEPSCs was primarily driven by alterations in large sEPSCs (>9-10pA, as shown in cumulative probability in Fig. 1b right), with smaller ones being relatively unaffected. Consequently, a reduction in sEPSC amplitude may not necessarily result in a significant decrease in frequency since their values likely remain above the detection threshold of 3 pA. 

      If the question is whether we should see the same parameters affected by the genetic manipulation in both sEPSC and mEPSC, then another critical consideration is the involvement of the releasable pool in mEPSCs versus sEPSCs. Current knowledge suggests that activity-dependent and -independent release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites. This concept has been extensively explored (Sara et al., 2005; Sara et al., 2011; reviewed in Ramirez and Kavalali, 2011; Kavalali, 2015). Consequently, while we may have traditionally interpreted activitydependent and -independent data assuming they utilize the same pool, this is no longer accurate. The current discussion in the field revolves around understanding the mechanisms underlying such phenomena. Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. 

      (2) Another significant concern is the quality of synapse counting experiments. The authors attempted to colocalize pre- and postsynaptic markers Vglut1 and PSD95 with PV labelling. However, several issues arise. Firstly, the PV labelling seems confined to soma regions, with no visible dendrites. Given that the perisomatic region only receives a minor fraction of excitatory synapses, this labeling might not accurately represent the input coverage of PV cells. Secondly, the resolution of the images is insufficient to support clear colocalization of the synaptic markers. Thirdly, the staining patterns are peculiar, with PSD95 puncta appearing within regions clearly identified as somas by Vglut1, hinting at possible intracellular signals. Furthermore, PSD95 seems to delineate potential apical dendrites of pyramidal cells passing through the region, yet Vglut1+ partners are absent in these segments, which are expected to be the marker of these synapses here. Additionally, the cumulative density of Vglut2 and Vglut1 puncta exceeds expectations, and it's surprising that subcortical fibers labeled by Vglut2 are comparable in number to intracortical Vglut1+ axon terminals. Ideally, N(Vglut1)+N(Vglut2) should be equal or less than N(PSD95), but this is not the case here. Consequently, these results cannot be considered reliable due to these issues.

      We apologize, as it appears that the images we provided in the first submission have caused confusion. The selected images represent a single focal plane of a confocal stack, which was visually centered on the PV cell somata. We chose just one confocal plane because we thought it showed more clearly the apposition of presynaptic and postsynaptic immunolabeling around the somata. In the revised version of the manuscript, we now provide higher magnification images, which will clearly show how we identified and selected the region of interest for the quantification of colocalized synaptic markers (Supplemental Figure 2). In our confocal stacks, we can also identify PV immunolabeled dendrites and colocalized vGlut1/PSD95 or vGlut2/PSD95 puncta on them; but these do not appear in the selected images because, as explained, only one focal plane, centered on the PV cell somata, was shown. 

      We acknowledge the reviewer's point that in PV+ cells the majority of excitatory inputs are formed onto dendrites; however, we focused on the somatic excitatory inputs to PV cells, because despite their lower number, they produce much stronger depolarization in PV neurons than dendritic excitatory inputs (Hu et al., 2010; Norenberg et al., 2010). Further, quantification of perisomatic putative excitatory synapses is more reliable since by using PV immunostaining, we can visualize the soma and larger primary dendrites, but smaller, higher order dendrites are not be always detectable. Of note, PV positive somata receive more excitatory synapses than SST positive and pyramidal neuron somata as found by electron microscopy studies in the visual cortex (Hwang et al., 2021; Elabbady et al., 2024).

      Regarding the comment on the density of vGlut1 and vGlut2 puncta, the reason that the numbers appear high and similar between the two markers is because we present normalized data (cHet normalized to their control values for each set of immunolabelling) to clearly represent the differences between genotypes. We now provide a more detailed explanation of our methods in the revised manuscript.  Briefly, immunostained sections were imaged using a Leica SP8-STED confocal microscope, with an oil immersion 63x (NA 1.4) at 1024 X 1024, z-step =0.3 μm, stack size of ~15 μm. Images were acquired from the auditory cortex from at least 3 coronal sections per animal. All the confocal parameters were maintained constant throughout the acquisition of an experiment. All images shown in the figures are from a single confocal plane. To quantify the number of vGlut1/PSD95 or vGlut2/PSD95 putative synapses, images were exported as TIFF files and analyzed using Fiji (Image J) software. We first manually outlined the profile of each PV cell soma (identified by PV immunolabeling). At least 4 innervated somata were selected in each confocal stack. We then used a series of custom-made macros in Fiji as previously described (Chehrazi et al, 2023). After subtracting background (rolling value = 10) and Gaussian blur (σ value = 2) filters, the stacks were binarized and vGlut1/PSD95 or vGlut2/PSD95 puncta were independently identified around the perimeter of a targeted soma in the focal plane with the highest soma circumference. Puncta were quantified after filtering particles for size (included between 0-2μm2) and circularity (included between 01). Data quantification was done by investigators blind to the genotype, and presented as normalized data over control values for each experiment.

      (3) One observation from the minimal stimulation experiment was concluded by an unsupported statement. Namely, the change in the onset delay cannot be attributed to a deficit in the recruitment of PV+ cells, but it may suggest a change in the excitability of TC axons.

      We agree with the reviewer, please see answer to point below.

      (4) The conclusions drawn from the stimulation experiments are also disconnected from the actual data. To make conclusions about TC release, the authors should have tested release probability using established methods, such as paired-pulse changes. Instead, the only observation here is a change in the AMPA components, which remained unexplained.

      As suggested, we performed additional paired-pulse ratio experiments at different intervals. We found that, in contrast with Control mice, evoked excitatory inputs to layer IV PV+ cells showed paired-pulse facilitation in cHet mice (Figure 3g, h), suggesting that thalamocortical presynaptic sites likely have decreased release probability in mutant compared to control mice.  We rephrased the text according to the data obtained from this new experiment.

      (5) The sampling rate of CC recordings is insufficient to resolve the temporal properties of the APs. Therefore, the phase-plots cannot be interpreted (e.g. axonal and somatic AP components are not clearly separated), raising questions about how AP threshold and peak were measured. The low sampling rate also masks the real derivative of the AP signals, making them apparently faster.

      We acknowledge that a higher sampling rate would provide a more detailed and smoother phase-plot. However, in the context of action potential parameters analysis here, it is acceptable to use sampling rates ranging from 10 kHz to 20 kHz (Golomb et al., 2007; Stevens et al., 2021; Zhang et al., 2023), which are considered adequate in the context of the present study. Indeed, our study aims to evaluate "relative" differences in the electrophysiological phenotype when comparing groups following a specific genetic manipulation. A sampling rate of 10 kHz is commonly employed in similar studies, including those conducted by our collaborator and co-author S. Kourrich (e.g., Kourrich and Thomas 2009, Kourrich et al., 2013), as well as others (Russo et al., 2013; Ünal et al., 2020; Chamberland et al., 2023). Despite being acquired at a lower sampling rate than potentially preferred by the reviewer, our data clearly demonstrate significant differences between the experimental groups, especially for parameters that are negligibly or not affected by the sampling rate used here (e.g., #spikes/input, RMP, Rin, Cm, Tm, AP amplitude, AP latency, AP rheobase).

      Regarding the phase-plots, a higher sampling rate would indeed have resulted in smoother curves. However, the differences were sufficiently pronounced to discern the relative variations in action potential waveforms between the experimental groups.

      A related issue is that the Methods section lacks essential details about the recording conditions, such as bridge balance and capacitance neutralization.

      We indeed performed bridge balance and neutralized the capacitance before starting every recording. We added the information in the methods.

      (6) Interpretation issue: One of the most fundamental measures of cellular excitability, the rheobase, was differentially affected by cHet in BCshort and BCbroad. Yet, the authors concluded that the cHet-induced changes in the two subpopulations are common.

      We are uncertain if we have correctly interpreted the reviewer's comment. While we observed distinct impacts on the rheobase (Fig. 7d and 7i), there seems to be a common effect on the AP threshold (Fig. 7c and 7h), as interpreted and indicated in the final sentence of the results section for Figure 7. If our response does not address the reviewer's comment adequately, we would greatly appreciate it if the reviewer could rephrase their feedback.

      (7) Design issue:

      The Kv1 blockade experiments are disconnected from the main manuscript. There is no experiment that shows the causal relationship between changes in DTX and cHet cells. It is only an interesting observation on AP halfwidth and threshold. However, how they affect rheobase, EPSCs, and other topics of the manuscript are not addressed in DTX experiments.

      Furthermore, Kv1 currents were never measured in this work, nor was the channel density tested. Thus, the DTX effects are not necessarily related to changes in PV cells, which can potentially generate controversies.

      While we acknowledge the reviewer's point that Kv1 currents and density weren't specifically tested, an important insight provided by Fig. 5 is the prolonged action potential latency. This delay is significantly influenced by slowly inactivating subthreshold potassium currents, namely the D-type K+ current. It's worth noting that D-type current is primarily mediated by members of the Kv1 family. The literature supports a role for Kv1.1containing channels in modulating responses to near-threshold stimuli in PV cells (Wang et al., 1994; Goldberg et al., 2008; Zurita et al., 2018). However, we recognize that besides the Kv1 family, other families may also contribute to the observed changes.

      To address this concern, we revised the manuscript by referring to the more accurate term "D-type K+ current", and rephrased the discussion to clarify the limit of our approach. It is not our intention to open unnecessary controversy, but present the data we obtained. We believe this approach and rephrasing the discussion as proposed will prevent unnecessary controversy and instead foster fruitful discussions.

      (8) Writing issues:

      Abstract:

      The auditory system is not mentioned in the abstract.

      One statement in the abstract is unclear. What is meant by "targeting Kv1 family of voltagegated potassium channels was sufficient..."? "Targeting" could refer to altered subcellular targeting of the channels, simple overexpression/deletion in the target cell population, or targeted mutation of the channel, etc. Only the final part of the Results revealed that none of the above, but these channels were blocked selectively.

      We agree with the reviewer and we will rephrase the abstract accordingly.

      Introduction:

      There is a contradiction in the introduction. The second paragraph describes in detail the distinct contribution of PV and SST neurons to auditory processing. But at the end, the authors state that "relatively few reports on PV+ and SST+ cell-intrinsic and synaptic properties in adult auditory cortex". Please be more specific about the unknown properties.

      We agree with the reviewer and we will rephrase more specifically.

      (9) The introduction emphasizes the heterogeneity of PV neurons, which certainly influences the interpretation of the results of the current manuscript. However, the initial experiments did not consider this and handled all PV cell data as a pooled population.

      In the initial experiments, we handled all PV cell data together because we wanted to be rigorous and not make assumptions on the different PV cells, which in later experiments we distinguished based on the intrinsic properties alone. Nevertheless, based on this and other reviewers’ comments, we completely rewrote the introduction in the revised manuscript to increase both focus and clarity.

      (10) The interpretation of the results strongly depends on unpublished work, which potentially provide the physiological and behavioral contexts about the role of GABAergic neurons in SynGap-haploinsufficiency. The authors cite their own unpublished work, without explaining the specific findings and relation to this manuscript.

      We agree with the reviewer and provided more information and updated references in the revised version of this manuscript. Our work is now in press in Journal of Neuroscience.

      (11) The introduction of Scholl analysis experiments mentions SOM staining, however, there is no such data about this cell type in the manuscript.

      We thank the reviewer for noticing the error; we changed SOM with SST (SOM and SST are two commonly used acronyms for Somatostatin expressing interneurons).

      Reviewer #3 (Public Review):

      This paper compares the synaptic and membrane properties of two main subtypes of interneurons (PV+, SST+) in the auditory cortex of control mice vs mutants with Syngap1 haploinsufficiency. The authors find differences at both levels, although predominantly in PV+ cells. These results suggest that altered PV-interneuron functions in the auditory cortex may contribute to the network dysfunction observed in Syngap1 haploinsufficiencyrelated intellectual disability. The subject of the work is interesting, and most of the approach is direct and quantitative, which are major strengths. There are also some weaknesses that reduce its impact for a broader field.

      (1) The choice of mice with conditional (rather than global) haploinsufficiency makes the link between the findings and Syngap1 relatively easy to interpret, which is a strength. However, it also remains unclear whether an entire network with the same mutation at a global level (affecting also excitatory neurons) would react similarly.

      We agree with the reviewer and now discuss this important caveat in the revised manuscript.

      (2) There are some (apparent?) inconsistencies between the text and the figures. Although the authors appear to have used a sophisticated statistical analysis, some datasets in the illustrations do not seem to match the statistical results. For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences. 

      We respectfully disagree, we do not think the text and figures are inconsistent. In the cited example, large apparent difference in mean values does not show significance due to the large variability in the data; further, we did not exclude any data points, because we wanted to be rigorous. In particular, for Fig.1g, statistical analysis shows a significant increase in the inter-mEPSC interval (*p=0.027, LMM) when all events are considered (cumulative probability plots), while there is no significant difference in the inter-mEPSCs interval for inter-cell mean comparison (inset, p=0.354, LMM).  Inter-cell mean comparison does not show difference with Mann-Whitney test either (p=0.101, the data are not normally distributed, hence the choice of the Mann-Whitney test). For Fig. 3f (eNMDA), the higher mean value for the cHet versus the control is driven by two data points which are particularly high, while the other data points overlap with the control values. The MannWhitney test show also no statistical difference (p=0.174).

      In the manuscript, discussion of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. In the supplemental tables, we provided the results of the statistical analysis done with both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.

      Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not seem to show that.

      We apologize for our lack of clarity. In legend 9, we reported the statistical comparisons between 1) vehicle-treated cHET vs control PV+ cells and 2) a-DTX-treated cHET vs control PV+ cells. We rephrased the legend of the figure to avoid confusion.

      (3) The authors mention that the lack of differences in synaptic current kinetics is evidence against a change in subunit composition. However, in some Figures, for example, 3a, the kinetics of the recorded currents appear dramatically different. It would be important to know and compare the values of the series resistance between control and mutant animals.

      We agree with the reviewer that there appears to be a qualitative difference in eNMDA decay between conditions, although quantified eNMDA decay itself is similar between groups. We have used a cutoff of 15 % for the series resistance (Rs), which is significantly more stringent as compared to the cutoff typically used in electrophysiology, which are for the vast majority between 20 and 30%. To answer this concern, we re-examined the Rs, we compared Rs between groups and found no difference for Rs in eAMPA (Control mice: 13.2±0.5, n=16 cells from 7 mice vs cHet mice: 13.7±0.3, n=14 cells from 7 mice; LMM, p=0.432) and eNMDA (Control mice: 12.7±0.7, n=6 cells from 3 mice vs cHet mice: 13.8±0.7 in cHet n=6 cells from 5 mice: LMM, p=0.231). Thus, the apparent qualitative difference in eNMDA decay stems from inter-cell variability rather than inter-group differences. Notably, this discrepancy between the trace (Fig. 3a) and the data (Fig. 3f, right) is largely due to inter-cell variability, particularly in eNMDA, where a higher but non-significant decay rate is driven by a couple of very high values (Fig. 3f, right). In the revised manuscript, we now show traces that better represent our findings.

      (4) A significant unexplained variability is present in several datasets. For example, the AP threshold for PV+ includes points between -50-40 mV, but also values at around -20/-15 mV, which seems too depolarized to generate healthy APs (Fig 5c, Fig7c).

      We acknowledge the variability in AP threshold data, with some APs appearing too depolarized to generate healthy spikes. However, we meticulously examined each AP that spiked at these depolarized thresholds and found that other intrinsic properties (such as Rin, Vrest, AP overshoot, etc.) all indicate that these cells are healthy. Therefore, to maintain objectivity and provide unbiased data to the community, we opted to include them in our analysis. It's worth noting that similar variability has been observed in other studies (Bengtsson Gonzales et al., 2020; Bertero et al., 2020).

      Further, we conducted a significance test on AP threshold excluding these potentially unhealthy cells and found that the significant differences persist. After removing two outliers from the cHet group with values of -16.5 and 20.6 mV, we obtain: -42.6±1.01 mV in control, n=33, 15 mice vs -36.2±1.1 mV in cHet, n=38 cells, 17 mice (LMM, ***p<0.001). Thus, whether these cells are included or excluded, our interpretations and conclusions remain unchanged.

      We would like to clarify that these data have not been corrected with the junction potential, as described in the revised version.

      (5) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2.

      We apologize for our lack of clarity. Although the analysis was done at high resolution, the figures were focused on showing multiple PV somata receiving excitatory inputs. We added higher magnification figures and more detailed information in the methods of the revised version. Please also see our response to reviewer #2.

      (6) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      While we acknowledge the theoretical expectation that changes in intrinsic parameters should correlate with alterations in neuronal firing, the absence of differences in the parameters analyzed in this study is not incompatible with the clear and significant decrease in firing rate observed in cHet SST+ cells. It's indeed possible that other intrinsic factors, not assessed in this study, may have contributed to this effect. However, exploring these mechanisms is beyond the scope of our current investigation. We rephrased the discussion and added this limitation of our study in the revised version.

      (7) The plots used for the determination of AP threshold (Figs 5c, 7c, and 7h) suggest that the frequency of acquisition of current-clamp signals may not have been sufficient, this value is not included in the Methods section.

      This study utilized a sampling rate of 10 kHz, which is a standard rate for action potential analysis in the present context. While we acknowledge that a higher sampling rate could have enhanced the clarity of the phase plot, our recording conditions, as detailed in our response to Rev#2/comment#5, were suitable for the objectives of this study.

      Reference list

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells Scientific Reports 10: 15680 https://doi.org/10.1038/s41598-020-72588-1

      Bertero A, Zurita H, Normandin M, Apicella AJ (2020) Auditory long-range parvalbumin cortico-striatal neurons. Frontiers in Neural Circuits 14:45 http://doi.org/10.3389/fncir.2020.00045

      Chamberland S, Nebet ER, Valero M, Hanani M, Egger R, Larsen SB, Eyring KW, Buzsáki G, Tsien RW (2023) Brief synaptic inhibition persistently interrupts firing of fastspiking interneurons Neuron 111:1264–1281 http://doi.org/10.1016/j.neuron.2023.01.017 

      Chehrazi P, Lee KKY, Lavertu-Jolin M, Abbasnejad Z, Carreño-Muñoz MI, Chattopadhyaya B, Di Cristo G (2023). The p75 neurotrophin receptor in preadolescent prefrontal parvalbumin interneurons promotes cognitive flexibility in adult mice Biological Psychiatry 94:310-321 doi: https://doi.org/10.1016/j.biopsych.2023.04.019

      Elabbady L, Seshamani S, Mu S, Mahalingam G, Schneider-Mizell C, Bodor AL, Bae JA, Brittain D, Buchanan J, Bumbarger DJ, Castro MA, Dorkenwald S, Halageri A, Jia Z, Jordan C, Kapner D, Kemnitz N, Kinn S, Lee K, Li K, Lu R, Macrina T, Mitchell E, Mondal SS,  Popovych S, Silversmith W, Takeno M, Torres R,  Turner NL, Wong W,  Wu J, Yin W, Yu SC, The MICrONS Consortium,  Seung S,  Reid C,  Da Costa NM,  Collman F (2024) Perisomatic features enable efficient and dataset wide cell-type classifications across large-scale electron microscopy volumes bioRxiv, https://doi.org/10.1101/2022.07.20.499976

      Goldberg EM, Clark BD, Zagha E, Nahmani M, Erisir A, Rudy B (2008) K+ Channels at the axon initial segment dampen near-threshold excitability of neocortical fastspiking GABAergic interneurons. Neuron 58 :387–400 https://doi.org/10.1016/j.neuron.2008.03.003

      Golomb D, Donner K, Shacham L, Shlosberg D, Amitai Y, Hansel D. (2007). Mechanisms of firing patterns in fast-spiking cortical interneurons PLoS Computational Biology 38:e156 http://doi.org/10.1371/journal.pcbi.0030156

      Hu H, Martina M, Jonas P (2010). Dendritic mechanisms underlying rapid synaptic activation of fast-spiking hippocampal interneurons. Science 327:52–58. http://doi.org/10.1126/science.1177876

      Hwang YS, Maclachlan C, Blanc J, Dubois A, Petersen CH, Knott G, Lee SH (2021). 3D ultrastructure of synaptic inputs to distinct gabaergic neurons in the mouse primary visual cortex. Cerebral Cortex 31:2610–2624 http://doi.org/10.1093/cercor/bhaa378

      Jadhav V, Carreno-Munoz MI, Chehrazi P, Michaud JL, Chattopadhyaya B, Di Cristo G (2024) Developmental Syngap1 haploinsufficiency in medial ganglionic eminencederived interneurons impairs auditory cortex activity, social behavior and extinction of fear memory The Journal of Neuroscience in press.

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release Nature Reviews Neuroscience 16:5–16. https://doi.org/10.1038/nrn3875

      Kourrich S, Thomas MJ (2009) Similar neurons, opposite adaptations: psychostimulant experience differentially alters firing properties in accumbens core versus shell Journal of Neuroscience 29:12275-12283 http://doi.org:10.1523/JNEUROSCI.302809.2009

      Kourrich S, Hayashi T, Chuang JY, Tsai SY, Su TP, Bonci A (2013) Dynamic interaction between sigma-1 receptor and Kv1.2 shapes neuronal and behavioral responses to cocaine Cell 152:236–247. http://doi.org/10.1016/j.cell.2012.12.004 

      Norenberg A, Hu H, Vida I, Bartos M, Jonas P (2010) Distinct nonuniform cable properties optimize rapid and efficient activation of fast-spiking GABAergic interneurons Proceedings of the National Academy of Sciences 107:894–9. http://doi.org/10.1073/pnas.0910716107

      Ramirez DM, Kavalali ET (2011) Differential regulation of spontaneous and evoked neurotransmitter release at central synapses Current Opinion in Neurobiology 21:275282 https://doi.org/10.1016/j.conb.2011.01.007

      Russo G, Nieus TR, Maggi S, Taverna S (2013) Dynamics of action potential firing in electrically connected striatal fast-spiking interneurons Frontiers in Cellular Neuroscience 7:209 https://doi.org/10.3389/fncel.2013.00209

      Sara Y, Virmani T, Deák F, Liu X, Kavalali ET (2005) An isolated pool of vesicles recycles at rest and drives spontaneous neurotransmission Neuron 45:563-573 https://doi.org/10.1016/j.neuron.2004.12.056

      Sara Y, Bal M, Adachi M, Monteggia LM, Kavalali ET (2011) Use-dependent AMPA receptor block reveals segregation of spontaneous and evoked glutamatergic neurotransmission Journal of Neuroscience 14:5378-5382 https://doi.org/10.1523/JNEUROSCI.5234-10.2011

      Stevens SR, Longley CM, Ogawa Y, Teliska LH, Arumanayagam AS, Nair S, Oses-Prieto JA, Burlingame AL, Cykowski MD, Xue M, Rasband MN (2021) Ankyrin-R regulates fast-spiking interneuron excitability through perineuronal nets and Kv3.1b K+ channels eLife 10:e66491 http://doi.org/10.7554/eLife.66491  

      Ünal CT, Ünal B, Bolton MM (2020) Low-threshold spiking interneurons perform feedback inhibition in the lateral amygdala Brain Structure and Function 225:909–923. http://doi.org/10.1007/s00429-020-02051-4

      Wang H, Kunkel DD, Schwartzkroin PA, Tempel BL (1994) Localization of Kv1.1 and Kv1.2, two K channel proteins, to synaptic terminals, somata, and dendrites in the mouse brain. The Journal of Neuroscience 14:4588-4599. https://doi.org/10.1523/JNEUROSCI.14-08-04588.1994

      Zhang YZ, Sapantzi S, Lin A, Doelfel SR, Connors BW, Theyel BB (2023) Activitydependent ectopic action potentials in regular-spiking neurons of the neocortex. Frontiers in Cellular Neuroscience 17 https://doi.org/10.3389/fncel.2023.1267687

      Zurita H, Feyen PLC, Apicella AJ (2018) Layer 5 callosal parvalbumin-expressing neurons: a distinct functional group of GABAergic neurons. Frontiers in Cellular Neuroscience 12:53 https://doi.org/10.3389/fncel.2018.00053

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      (1) The introduction nicely summarizes multiple aspects of cortical auditory physiology and auditory stimulus processing, but the experiments in this study are performed ex vivo in acute slices. I wonder if it would be beneficial to shorten the initial parts of the introduction and consider a more focused approach highlighting, for example, to what extent Syngap1 expression levels change during development and/or vary across cortical areas. What cortical cell types express Syngap1 in addition to PV+ and SST+ cells? If multiple cell types normally express Syngap1, the introduction could clarify that the present study investigated Syngap1 insufficiency by isolating its effects in PV+ and SST+ neurons, a condition that may not reflect the situation in mental health disorders, but that would allow to better understand the global effects of Syngap1 deficiency.

      We thank the reviewer for this very helpful suggestion. We have changed the introduction as suggested.

      (2) Because mEPSCs are not affected in Syngap+/- interneurons, the authors conclude that the lower sEPSC amplitude is due to decreased network activity. However, it is likely that the absence of significant difference (Fig 1g), is due to lack of statistical power (control: 18 cells from 7 mice, cHet: 8 cells from 4 mice). By contrast, the number of experiments recording sIPSCs and mIPSCs (Fig 2) is much larger. Hence, it seems that adding mEPSC data would allow the authors to more to convincingly support their conclusions. To more directly test whether Syngap insufficiency affects excitatory inputs by reducing network activity, ideally the authors would want to record sEPSCs followed by mEPSCs from each PV+ neuron (control or cHet). Spontaneous event frequency and amplitude should be higher for sEPSCs than mEPSCs, and Syngap1 deficiency should affect only sEPSCs, since network activity is abolished following tetrodotoxin application for mEPSC recordings.

      We agreed with the reviewer’s suggestion, and recorded sEPSCs followed by mEPSCs from PV+ neurons in control and cHet mice (Figure supplement 3). In both genotypes, we found no significative difference in either amplitude or inter-event intervals between sEPSC and mEPSC, suggesting that in acute slices from adult A1, most sEPSCs may actually be action potentialindependent. While perhaps surprisingly at first glance, this result can be explained by recent published work suggesting that action potentials-dependent (sEPSC) and -independent (mEPSC) release may not necessarily engage the same pool of vesicles or target the same postsynaptic sites (Sara et al., 2005; Sara et al., 2011; reviewed in Ramirez and Kavalali, 2011; Kavalali, 2015). Consequently, while we may have traditionally interpreted activity-dependent and -independent data assuming they utilize the same pool, this is no longer accurate; and indeed, the current discussion in the field revolves around understanding the mechanisms underlying such phenomena.

      Therefore, comparisons between sEPSCs and mEPSCs may not yield conclusive data but rather speculative interpretations. We have added this caveat in the result section.

      (3) The interpretation of the data of experiments studying thalamic inputs and single synapses should be clarified and/or rewritten. First, it is not clear why the authors assume they are selectively activating thalamic fibers with electrical stimulation. Presumably the authors applied electrical stimulation to the white matter, but the methods not clearly explained? Furthermore, the authors could clarify how stimulation of a single axon was verified and how could they distinguish release failures from stimulation failures, since the latter are inherent to using minimal stimulation conditions. Interpretations of changes in potency, quantal content, failure rate, etc, depend on the ability to distinguish release failures from stimulation failures. In addition, can the authors provide information on how many synapses a thalamic axon does establish with each postsynaptic PV+ cell from control or Syngap-deficient mice? Even if stimulating a single thalamic axon would be possible, if the connections from single thalamic axons onto single PV+ or SST+ cells are multisynaptic, this would make the interpretation of minimal stimulation experiments in terms of single synapses very difficult or unfeasible. In the end, changes in EPSCs evoked by electrical stimulation may support the idea that Syngap1 insufficiency decreases action potential evoked release, that in part mediates sEPSC, but without indicating the anatomical identity of the stimulated inputs (thalamic, other subcortical or cortico-cortical?

      We agree with the reviewer, our protocol does not allow the stimulation of single synapses/axons, but rather bulk stimulation of multiple axons. We thank the reviewer for bringing up this important point.  In our experiment, we reduced the stimulus intensity until no EPSC was observed, then increased it until we reached the minimum intensity at which we could observe an EPSC. We now explain this approach more clearly in the method and changed the results section by removing any reference to “minimal” stimulation.

      Electrical stimulation of thalamic radiation could indeed activate not only monosynaptic thalamic fibers but also polysynaptic (corticothalamic and/or corticocortical) EPSC component. To identify monosynaptic thalamocortical connections, we used as criteria the onset latencies of EPSC and the variability jitter obtained from the standard deviation of onset latencies, as previously published by other studies (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). Onset latencies were defined as the time interval between the beginning of the stimulation artifact and the onset of the EPSC. Monosynaptic connections are characterized by short onset latencies and low jitter variability (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). In our experiments, the initial slopes of EPSCs evoked by white matter stimulation had short onset latencies (mean onset latency, 4.27 ± 0.11 ms, N=16 neurons in controls, and 5.07 ± 0.07 ms, N=14 neurons in cHet mice) and low onset latency variability jitter (0.24 ± 0.03 ms in controls vs 0.31 ± 0.03 ms in cHet mice), suggestive of activation of monosynaptic thalamocortical monosynaptic connections (Richardson et al., 2009; Blundon et al., 2011; Chun et al., 2013). Of note, a previous study in adult mice (Krause et al., 2014) showed that local field potentials evoked by electrical stimulation of medial geniculate nucleus or thalamic radiation were comparable. The information is included in the revised manuscript, in the methods section.

      (4) The data presentation in Fig 6 is a bit confusing and could be clarified. First, in cluster analysis (Fig 6a), the authors may want to clarify why a correlation between Fmax and half width is indicative of the presence of subgroups. Second, performing cluster analysis based on two variables alone (Fmax and half-width) might not be very informative, but perhaps the authors could better explain why they chose two variables and particularly these two variables? For reference, see the study by Helm et al. 2013 (cited by the authors) using multivariate cluster analysis. Additionally, the authors may want to clarify, for non-expert readers, whether or not finding correlations between variables (heatmap in the left panel of Fig 6b) is a necessary condition to perform PCA (Fig 6b right panel).

      We apologize for the confusion and thank the reviewer for the comment. The choice of Fmax and half width to cluster PV+ subtypes was based on past observation of atypical PV+ cells characterized by a slower AP half-width and lower maximal AP firing frequency (Nassar et al., 2015; Bengtsson Gonzales et al., 2018; Ekins et al., 2020; Helm et al., 2013). Based on these previous studies we performed hierarchical clustering of AP half-width and Fmax-initial values based on Euclidean distance. However, in our case some control PV+ cells showed no correlation between these parameters (as it appears in Fig 6a left, right, and 6b left), requiring the use of additional 11 parameters to perform Principal Component Analysis (PCA). PCA takes a large data set with many variables per observation and reduces them to a smaller set of summary indices (Murtagh and Heck 1987).  We choose in total 13 parameters that are largely unrelated, while excluding others that are highly correlated and represent similar features of membrane properties (e.g., AP rise time and AP half-width). PCA applies a multiexponential fit to the data, and each new uncorrelated variable [principal component (PC)] can describe more than one original parameter (Helm et al., 2013). We added information in the methods section as suggested.

      Minor points:

      (1) In Fig 3a, the traces illustrating the effects of syngap haplo-insufficiency on AMPA and NMDA EPSCs do not seem to be the best examples? For instance, the EPSCs in syngap-deficient neurons show quite different kinetics compared with control EPSCs, however Fig 3f suggests similar kinetics.

      We changed the traces as suggested.

      (2) In the first paragraph of results, it would be helpful to clarify that the experiments are performed in acute brain slices and state the age of animals.

      Done as suggested.

      (3) The following two sentences are partly redundant and could be synthesized or merged to shorten the text: "Recorded MGE-derived interneurons, identified by GFP expression, were filled with biocytin, followed by posthoc immunolabeling with anti-PV and anti-SST antibodies. PV+ and SST+ interneuron identity was confirmed using neurochemical marker (PV or SST) expression and anatomical properties (axonal arborisation location, presence of dendritic spines)."

      We rewrote the paragraph to avoid redundancy, as suggested.

      (4) In the following sentence, the mention of dendritic spines is not sufficiently clear, does it mean that spine density or spine morphology differ between PV and SST neurons?: "PV+ and SST+ interneuron identity was confirmed using neurochemical marker (PV or SST) expression and anatomical properties (axonal arborisation location, presence of dendritic spines)."

      We meant absence or presence of spines. PV+ cells typically do not have spines, while SST+ interneurons do. We corrected the sentence to improve clarity.

      (5) The first sentence of the discussion might be a bit of an overinterpretation of the data? Dissecting the circuit mechanisms of abnormal auditory function with Syngap insufficiency requires experiments very different from those reported in this paper. Moreover, that PV+ neurons from auditory cortex are particularly vulnerable to Syngap deficiency is possible, but this question is not addressed directly in this study because the effects on auditory cortex PV+ neurons were not thoroughly compared with those on PV+ cells from other cortical areas.

      We agreed with the reviewer and changed this sentence accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Minor issues:

      "glutamatergic synaptic inputs to Nkx2.1+ interneurons from adult layer IV (LIV) auditory cortex" it would be more correct if this sentence used "in adult layer IV" instead of "from".

      We made the suggested changes.

      It would be useful information to provide whether the slice quality and cellular health was affected in the cHet animals.

      We did not observe any difference between control and cHet mice in terms of slices quality, success rate of recordings and cellular health. We added this sentence in the methods.

      Were BCshort and BCbroad observed within the same slice, same animals? This information is important to exclude the possibility of experimental origin of the distint AP width.

      We have indeed found both type of BCs in the same animal, and often in the same slice.

      Reviewer #3 (Recommendations For The Authors):

      (1) The introduction is rather diffuse but should be more focused on Syngap1, cellular mechanisms and interneurons. For example, the authors do not even define what Syngap1 is.

      We thank the reviewer for this very helpful suggestion. We have changed the introduction as suggested.

      (2) Some of the figures appear very busy with small fonts that are difficult to read. Also, it is very hard to appreciate the individual datapoints in the blue bars. Could a lighter color please be used?

      We thank the reviewer for this helpful suggestion. We made the suggested changes.

      (3)     The strength/limit of using a conditional knockout should be discussed.

      Done as suggested, in the revised Discussion.

      (4) Statistical Methods should be described more in depth and probably some references should be added. Also, do (apparent?) inconsistencies between the text and the figures depend on the analysis used? For example, neither Fig 1g nor Fig 3f (eNMDA) reach significance despite large differences in the illustration. Maybe the authors could acknowledge this trend and discuss potential reasons for not reaching significance. Also, the legend to Fig 9 indicates the presence of "a significant decrease in AP half-width from cHet in absence or presence of a-DTX", but the bar graph does not show that.

      The interpretation of the data is based on the results of the LMM analysis, which takes in account both the number of cells and the numbers of mice from which these cells are recorded. We chose this statistical approach because it does not rely on the assumption that cells recorded from same mouse are independent variables. We further provided detailed information about statistical analysis done in the tables associated to each figure where we show both LMM and the most commonly used Mann Whitney (for not normally distributed) or t-test (for normally distributed), for each data set.  As suggested, we added reference about LMM in Methods section.

      (5) Were overall control and mutant mice of the same average postnatal age? Is there a reason for the use of very young animals? Was any measured parameter correlated with age?

      Control and mutant mice were of the same postnatal age. In particular, the age range was 75.5 ± 1.8 postnatal days for control group and 72.1 ± 1.7 postnatal days in cHet group (mean ± S.E.M.). We did not use any young mice. We have added this information in the methods.

      (6) Figure 6. First, was the dendritic arborization of all cells fully intact? Second, if Figure 7 uses the same data of Figure 5 after a reclassification of PV+ cells into the two defined subpopulations, then Figure 5 should probably be eliminated as redundant. Also, if the observed changes impact predominantly one PV+ subpopulation, maybe one could argue that the synaptic changes could be (at least partially) explained by the more limited dendritic surface of BC-short (higher proportion in mutant animals) rather than only cellular mechanisms.

      All the reconstructions used for dendritic analysis contained intact cells with no evidently cut dendrites. We added this information in the methods section.

      Regarding Figure 5 we recognize the reviewer’s point of view; however, we think both figures are informative. In particular, Figure 5 shows the full data set, avoiding assumptions on the different PV cells subtype classification, and can be more readily compared with several previously published studies.

      We apologize for our lack of clarity, which may have led to a misunderstanding. In Figure 6i our data show that BC-short from cHet mice have a larger dendritic surface and a higher number of branching points compared to BC-short from control mice. 

      (7) I am rather surprised by the AP threshold of ~-20/-15 mV observed in the datapoints of some figures. Did the authors use capacitance neutralization for their current-clamp recordings? What was the sampling rate used? Some of the phase plots (Vm vs dV/dT) suggests that it may have been too low.

      See responses to public review.

      (8) Please add the values of the series resistance of the recordings and a comparison between control and mutant animals.

      As suggested, we re-examined the series resistance values (Rs), comparing Rs between groups and found no difference for Rs in eAMPA (Control mice: 13.2±0.5,  n=16 cells from 7 mice; cHet mice: 13.7±0.3, n=14 cells from 7 mice; LMM, p=0.432) and eNMDA (Control mice: 12.7±0.7, n=6 cells from 3 mice; cHet mice: 13.8±0.7, n=6 cells from 5 mice;  LMM, p=0.231).

      (9) I am unclear as to how the authors quantified colocalization between VGluts and PSD95 at the low magnification shown in Supplementary Figure 2. Could they please show images at higher magnification?

      Quantification was done on high resolution images. Immunostained sections were imaged using a Leica SP8-STED confocal microscope, with an oil immersion 63x (NA 1.4) at 1024 X 1024, zoom=1, z-step =0.3 μm, stack size of ~15 μm. As suggested by the reviewer, we changed the figure by including images at higher magnification.

      (10) The authors claim that "cHet SST+ cells showed no significant changes in active and passive membrane properties", but this claim would seem to be directly refused by the data of Fig 8f. In the absence of changes in either active or passive membrane properties shouldn't the current/#AP plot remain unchanged?

      The reduction in intrinsic excitability observed in SST+ cells from cHet mice could be due to intrinsic factors not assessed in this study. However, exploring these mechanisms is beyond the scope of our current investigation. We rephrased the discussion and added this limitation of our study in the revised version.

      (11) Please check references as some are missing from the list.

      Thank you for noticing this issue, which is now corrected.

      References  

      Bengtsson Gonzales C, Hunt S, Munoz-Manchado AB, McBain CJ, Hjerling-Leffler J (2020) Intrinsic electrophysiological properties predict variability in morphology and connectivity among striatal Parvalbumin-expressing Pthlh-cells Scientific Reports 10:15680 https://doi.org/10.1038/s41598-020-72588-1

      Blundon JA, Bayazitov IT, Zakharenko SS (2011) Presynaptic gating of postsynaptically expressed plasticity at mature thalamocortical synapses The Journal of Neuroscience 31:1601225 https://doi.org/10.1523/JNEUROSCI.3281-11.2011

      Chun S, Bayazitov IT, Blundon JA, Zakharenko SS (2013) Thalamocortical long-term potentiation becomes gated after the early critical period in the auditory cortex The journal of Neuroscience 33:7345-57 https://doi.org/10.1523/JNEUROSCI.4500-12.2013.

      Ekins TG, Mahadevan V, Zhang Y, D’Amour JA, Akgül G, Petros TJ, McBain CJ (2020) Emergence of non-canonical parvalbumin-containing interneurons in hippocampus of a murine model of type I lissencephaly eLife 9:e62373 https://doi.org/10.7554/eLife.62373

      Helm J, Akgul G, Wollmuth LP (2013) Subgroups of parvalbumin-expressing interneurons in layers 2/3 of the visual cortex Journal of Neurophysiology 109:1600–1613 https://doi.org/10.1152/jn.00782.2012

      Kavalali E (2015) The mechanisms and functions of spontaneous neurotransmitter release Nature Reviews Neuroscience 16:5–16 https://doi.org/10.1038/nrn3875

      Krause BM, Raz A, Uhlrich DJ, Smith PH, Banks MI (2014) Spiking in auditory cortex following thalamic stimulation is dominated by cortical network activity Frontiers in Systemic Neuroscience 8:170. https://doi.org/10.3389/fnsys.2014.00170

      Murtagh F, Heck A (1987) Multivariate Data Analysis. Dordrecht, The Netherlands: Kluwer Academic.

      Nassar M, Simonnet J, Lofredi R, Cohen I, Savary E, Yanagawa Y, Miles R, Fricker D (2015) Diversity and overlap of Parvalbumin and Somatostatin expressing interneurons in mouse presubiculum Frontiers in Neural Circuits 9:20. https://doi.org/10.3389/fncir.2015.00020

      Ramirez DM, Kavalali ET (2011) Differential regulation of spontaneous and evoked neurotransmitter release at central synapses Current Opinion in Neurobiology 21:275-282 https://doi.org/10.1016/j.conb.2011.01.007

      Richardson RJ, Blundon JA, Bayazitov IT, Zakharenko SS (2009) Connectivity patterns revealed by mapping of active inputs on dendrites of thalamorecipient neurons in the auditory cortex. The Journal of Neuroscience 29:6406-17 https://doi.org/10.1523/JNEUROSCI.3028-09.2009

      Sara Y, Virmani T, Deák F, Liu X, Kavalali ET (2005) An isolated pool of vesicles recycles at rest and drives spontaneous neurotransmission Neuron 45:563-573 https://doi.org/10.1016/j.neuron.2004.12.056

      Sara Y, Bal M, Adachi M, Monteggia LM, Kavalali ET (2011) Use-dependent AMPA receptor block reveals segregation of spontaneous and evoked glutamatergic neurotransmission Journal of Neuroscience 14:5378-5382 https://doi.org/10.1523/JNEUROSCI.5234-10.2011

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #2:

      No further questions, but please do add a sentence or two about the lack of these additional points in the discussion as a limitation to the study.

      We have included additional “limitations of the study” in the Discussion Section.

      Reviewer #3:

      The authors have added to the discussion some critical remarks about the limitations of the study, which will help in the assessment of the conclusions.

      In sum, the manuscript has significantly improved during the revision.

      Some minor points should be changed, though

      Page 18 marked: "What causes an age-dependent decrease in mitochondrial OXPHOS genes across tissues, however, is largely unknown." I assume, the authors do not suggest that the abundance of genes is reduced, which means elimination of DNA? Be more precise about this.

      We thank the reviewer for pointing this out. We have clarified this to mean “OXPHOS gene expression” and made a couple changes accordingly.

      Page 18 marked : a paragraph was added addressing the increase in mitochondrial respiration in the heart, this should be discussed in the light of literature as it was done for skeleton muscle the following paragraph

      We have included additional paragraphs in the Discussion Section to talk about increased mitochondrial respiration in the aging heart in the context of published literature.

      Figure 2: it was asked for error bars for the OCR measurements. Response: We have added the error bars and statistical significance to revised Figure 2; however, is it correct that there are no significant differences?

      Figure 2 ranks tissues based on the OCR values within a single group of mice (male or female, young or old) and is not a comparison between male vs female, or young vs old. For this reason, no statistics were included as they are not needed here. The goal of this figure is to highlight the OCR distribution across tissues within a single sex and age group.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have responded to these criticisms below and have revised the main text and figures. Here, we outline the major points of our responses:

      (1) The reviewers asked for more clarification regarding cell type annotation in the lung mesenchyme as shown in Figure 3C. We have included a new supplementary figure (Supplementary Figure 2) which shows differentially expressed genes amongst these mesenchymal cell subsets using a variety of visualization tools including a heatmap, UMAP plots, and the dotplot which was originally shown in Supplementary Figure 1D. The other supplemental figures have been re-numbered.

      (2) We acknowledge the lack of consensus in the field regarding the nomenclature of fibroblast subsets in the developing mouse lung. We are not attempting to define new subsets, but rather we adopted annotations based on previously published work. Specifically, we used Seurat to define mesenchymal cell clusters and then compared the gene expression patterns of these clusters to published work by Hurskainen et al. (Bernard Thebaud’s group) and Narvaez Del Pilar et al. (Jichou Chen’s group). We acknowledge these annotations might conflict with other published data, but any approach to choosing a cell label would be subject to scrutiny. For example, Col13a1 fibroblasts share markers with cells which have been defined by others as lipofibroblasts or alveolar fibroblasts. Similarly, Col14a1 fibroblasts appear to share markers with matrix fibroblasts. Further work is clearly needed to address these discrepancies, and we hope that making our data publicly available will help that effort. 

      (3) The reviewers asked us to interrogate changes in canonical markers of fibroblast subsets (i.e. lipofibroblasts, matrix fibroblasts) to address whether the apparent loss of myofibroblasts could be explained by a change in myofibroblast specification/differentiation. We have included these data in the responses, but because we are unable to draw any clear conclusions from these results, we do not feel these data warrant inclusion in the manuscript/figures.

      (4) As highlighted in the eLife assessment, our study does not include tissue validation (i.e. immunohistochemistry) of myofibroblast markers to distinguish whether the loss of myofibroblasts is attributable to lack of proliferation and/or changes in differentiation/specification. We spent considerable time over the past few months attempting to address these questions, however we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      In summary, we have addressed several concerns raised by the reviewers and have attempted to perform some of the additional experiments suggested.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. Inactivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Weaknesses:

      The critiques are mostly minor and can be addressed without extensive experimentation.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors systematically explore the mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreased numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advanced single-cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lungs.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Figure 3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is supporting the differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway-related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins, and Fibulins. An additional analysis in the aforementioned regard has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      We attempted to conduct additional analyses on our sequencing data to evaluate the impact of lung injury on the differentiation of Pdgfra-expressing cells towards other fibroblast lineages. To specifically address the impact of hyperoxia on fibroblast differentiation, we subsetted wildtype cells collected at the P7 timepoint (while pups were still undergoing hyperoxia treatment) from the larger data set. Shown below are several Violin Plots comparing gene expression between RA and O2 conditions across the mesenchymal populations.

      Although there are some interesting observations in this analysis, we could not identify a consistent theme from these data which could clearly answer the reviewers’ questions. We see a clear reduction of Pdgfra and Eln in both myofibroblast subsets with hyperoxia, which support our findings of reductions in the myofibroblast subsets. Acta2 and Tagln appear slightly lower in alveolar myofibroblasts, but both are higher in ductal myofibroblasts. Interestingly, both Acta2 and Tagln are higher in Col14a1 fibroblasts with hyperoxia. The functional relevance of these data are unclear because there appears to be higher per-cell expression of Acta2 in ductal myofibroblasts while the relative contribution of these cells is reduced (Figure 3D-E). Col14a1 fibroblasts show increased Acta2 and Tagln expression and are slightly increased in proportion at P7 with hyperoxia treatment (Figure 3D), albeit to a much lesser degree compared to Col13a1 fibroblasts.

      Author response image 1.

      Markers of ductal myofibroblasts including Hhip, Cdh4, and Aspn all appear lower with hyperoxia. Interestingly Plin2 expression is only slightly increased in Col13a1 fibroblasts with hyperoxia treatment, and there is also increased expression in alveolar myofibroblasts. Tcf21 is another marker commonly used to identify lipofibroblasts and its expression is similarly increased in myofibroblasts during hyperoxia, although its expression is conversely lower in Col13a1 and Col14a1 fibroblasts in our data. Overall, these data would appear consistent with recently published data by Ricetti et al. in which the authors observed an increase in lipofibroblast gene signatures and reduced myofibroblast gene signatures with hyperoxia treatment.

      Author response image 2.

      Author response image 3.

      The ability of our data to clearly identify changes in cell fate differentiation is limited by our use of Seurat to define cell clusters because these methods are likely to mask subtle gene expression changes in a small number of cells nested within a parent cluster. In the example above with Plin2, the change in Plin2 expression within myofibroblasts is not significant enough for Seurat to pull these cells out from their parent clusters to define a different lineage, nor are these cells similar enough in their current moment in time to be considered Col13a1 fibroblasts or lipofibroblasts. Increasing the dimensions used to define Seurat clusters might be sufficient to identify this subset of cells as a distinct cluster, however this approach would come at the expense of creating several more cell subsets with increasingly small populations which would be difficult to further analyze.

      One alternative approach to address these questions regarding differentiation might include using pseudo-time analysis of our sequencing data to predict cell lineage. Unfortunately, these analyses are beyond the scope of our current study, but we hope that our public data set can be used by investigators hoping to utilize this approach. Another method to address these questions could utilize a pulse-chase lineage experiment where one could label Pdgfra-expressing cells at the onset of injury and compare the differentiation of these labeled cells following injury. Li et al. conducted a similar experiment with hyperoxia in which Pdgfra-expressing cells were labeled during embryonic development and then postnatally following hyperoxia exposure. The authors noted a decrease in both lineaged myofibroblasts and lineaged lipofibroblasts and concluded that Pdgfra-lineaged cells were lost with hyperoxia treatment rather than undergoing aberrant differentiation. While these experiments likely have their own caveats related to the timing and efficiency of labeling, they represent a more conclusive approach to addressing differences in cell specification as compared to our sequencing- and flow cytometry-based approaches.

      Author response image 4.

      Author response image 5.

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to a very limited extent on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also it is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about the absence of inflammation in their BPD model, it will be important to show the evidence.

      We acknowledge that vascular changes significantly contribute to BPD pathogenesis, however our study was not designed to adequately characterize changes in vascular/endothelial cells. We were motivated to focus on the lung mesenchyme after observing a dramatic loss of PDGFRa+ cells with our initial characterization of the hyperoxia injury model (Figure 2). At the onset of our study, the existing publicly available data did not contain enough mesenchymal cells for in-depth analysis. To generate new observations and hypotheses within the lung mesenchyme we enriched our single cell prep for mesenchymal cells at the time of FACS-sorting to ensure we would have sufficient cell numbers for downstream analysis.

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and the potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease-relevant manner.

      We have edited text in the discussion to address this point.

      Reviewer #3 (Public Review):

      Summary:

      This paper seeks to understand the role of alveolar myofibroblasts in abnormal lung development after saccular stage injury.

      Strengths:

      Multiple models of neonatal injury are used, including hyperoxia and transgenic models that target alveolar myofibroblasts.

      Weaknesses:

      There are several weaknesses that leave the conclusions significantly undersupported by the data as presented:

      (1) There is no validation of the decreased number of myofibroblasts suggested by flow cytometry/scRNAseq at the level of the tissue. Given that multiple groups have reported increased myofibroblasts (aSMA+ fibroblasts) in humans with BPD and in mouse models, demonstrating a departure from prior findings with tissue validation in the mouse models is essential. There are many reasons for decreased numbers of a subpopulation by flow cytometry, most notably that injured cells may be less likely to survive the cell sorting process.

      Unfortunately, we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (aSMA/ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      Our single cell data show that there is increased expression of Acta2 and Tagln shown in the plots which might be consistent with the increased aSMA staining which others have observed in these settings. Interestingly, the transcripts of both genes are reduced in alveolar fibroblasts while increased in ductal myofibroblasts, Col13a1 fibroblasts, Col14a1 fibroblasts, and vascular smooth muscle. We did not include aSMA antibody staining in our flow cytometry experiments, but this would certainly add value to future attempts to characterize the phenotypic changes occurring during these injury models. 

      (2) The hallmark genes used to define the subpopulations are not given in single-cell data. As the definition of fibroblast subtypes remains an area of unsettled discussion in the field, it is possible that the decreased number by classification and not a true difference. Tissue validation and more transparency in the methods used for single-cell sequencing would be critical here.

      See response above and new Supplemental Figure 2.

      (3) There is an oversimplification of neonatal hyperoxia as a "BPD model" used here without a reference to detailed prior work demonstrating that the degree and duration of hyperoxia dramatically change the phenotype. For example, Morty et al have shown that hyperoxia of 85% or more x 14 days is required to demonstrate the septal thickening observed in severe human BPD. Other than one metric of lung morphometry (MLI), which is missing units on the y-axis and flexivent data, the authors have not fully characterized this model. Prior work comparing 75% O2 exposure for 5, 8, or 14 days shows that in the 8-day exposed group (similar to the model used here), much of the injury was reversible. What evidence do the authors have that hyperoxia alone is an accurate model of the permanent structural injury seen in human BPD?

      At the onset of our studies, we noted that several groups were using widely variable protocols ranging from 60-100% O2 exposure. Morty et al. have indeed conducted thorough experiments to characterize various different hyperoxia exposure protocols. In their 2017 study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312005/) they showed that 85% O2 from P1-P7 was sufficient to produce increased septal thickness compared to control mice, and this change was comparable to P1-P14 exposure with 85% O2. Interestingly, they also noted that some therapeutic interventions could rescue disease caused by 60% O2 but not 85% O2 exposure. Our criteria in choosing a treatment protocol were: (1) nursing dams and pups survived hyperoxia exposure, (2) injury was reproducible across cohorts, and (3) injury was not reversible simply by recovering in room air. We found that recent work utilizing 75% O2 exposure was sufficient to cause the alveolar simplification phenotype which we sought to investigate. In our hands, we did not observe mortality of nursing dams or pups except for litters lost to cannibalism/failure of cross-fostering.

      We are confident that the injury caused by our hyperoxia protocol is not reversible simply by recovering mice in room air. Several groups have phenotyped mice at P4, P10, or P14 immediately following the conclusion of hyperoxia treatment. To ensure that we were studying a lasting, irreversible phenotype, we conducted our endpoint studies (morphometry and lung physiology) at P40. Because mice continue to undergo alveolarization until ~P36-P39, we reasoned that this additional recovery time following cessation of hyperoxia would allow for spontaneous recovery if this injury was transient. Additionally, shown below are unpublished flexiVent data in which mice were treated for 10 days with 75% O2 and recovered until analysis at 10 weeks of age. These results are entirely consistent with the flexiVent data we have included in the manuscript, and the persistence of lung physiologic changes in adult mice suggest the presence of permanent underlying structural changes. We did not conduct morphometry/MLI studies at later timepoints, but we have no reason to suspect a different outcome given the clear results from lung physiology.

      Author response image 6.

      (4) Thibeault et al published a single-cell analysis of neonatal hyperoxia in 2021, with seemingly contrasting findings. How does this dataset compare in context?

      Our data is complimentary to the single-cell analysis published by Thebaud et al. We included a re-analysis of their mesenchymal data in Supplementary Figure 2 which shows they also observed a relative decrease in myofibroblast clusters at the P7 and P14 timepoints following hyperoxia treatment. Figure 4 of their paper highlights the top differentially expressed genes between RA and O2 in Col13a1 FB and myofibroblasts, and we observe nearly identical findings in our data set within each of these clusters. Below we have created dotplots of P7 wildtype samples for the same selected genes shown in Figure 4G of the Thebaud et al. paper. It is important to note that their clustering pooled all myofibroblasts into one cluster, while our data is divided into alveolar myofibroblasts and ductal myofibroblasts. The other difference is their data set includes all timepoints P3, P7 and P14 pooled for display, while the plot we selected for simplicity here is only P7 cells. From these data we can see that the general trends are identical to those observed by Thebaud et al., and the differences in genes such as Acta2 can be accounted for by different changes observed in the different myofibroblast clusters – which is identical to what is shown in the violin plots above – namely that Acta2 is reduced in hyperoxia in alveolar myofibroblasts while increased in the ductal myofibroblasts.

      Author response image 7.

      Alveolar myoFB

      Author response image 8.

      Ductal myoFB

      One difference between our two datasets is the relative contribution of myofibroblast and Col13a1 fibroblasts to the entire mesenchymal population of cells. Over 50% of all mesenchymal cells in our preps consist of myofibroblasts, while most of their mesenchymal cells are Col13a1 fibroblasts. These differences are likely accounted for by differences in tissue digestion and cell preparation protocols. However, despite these differences, their data show the same trends of decreased myofibroblasts and a relative expansion in Col13a1 fibroblasts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, for the hyperoxia model, it is informative to have the analysis done at P40, while most of the previous studies using this model focus on outcomes shortly after the end of the hyperoxia regimen. The authors state "we did not see evidence of fibrosis, scarring, or inflammation." It will be helpful to include data supporting this conclusion, especially ACTA2, CTHRC1, and CD45 staining.

      We did not conduct trichrome staining or hydroxyproline assays to quantify the absence of fibrotic changes because there were no gross histologic changes consistent with scarring or fibrosis by H&E staining. We have amended the text to say “we did not see evidence of fibrosis or scarring” since we did not publish any changes to characterize the immune cell compartment.

      (2) Figure 3, single cell analysis, naming of the clusters is confusing. Is "alveolar myofibroblasts" the same as "secondary crest myofibroblasts"? Is "Col13a1 FB" the same as "alveolar fibroblasts" and "Col14a1 FB" the same as "adventitial fibroblasts"? The loss of myofibroblasts is intriguing because, by staining, there is an increase of ACTA2+ cells. Are ACTA2+ cells not myofibroblasts in scRNAseq data?

      As mentioned in responses above, we used Jichou Chen’s nomenclature of “alveolar myofibroblasts” and “ductal myofibroblasts”, but we agree that the former cluster is most consistent with “secondary crest myofibroblasts”. To distinguish the two remaining clusters of fibroblasts we used the same nomenclature as found in Thebaud et al’s single cell data set- “Col13a1 FB and “Col14a1 FB”. The Col13a1 FB cluster is most consistent with “alveolar fibroblasts” and contains high expression of several genes used to define “lipofibroblasts”, though it is unclear whether the latter may represent a subcluster within the Col13a1 FB cluster.

      As shown above, Acta2 is expressed broadly within the lung mesenchyme with highest levels found in myofibroblasts and smooth muscle cells.

      (3) Phosphorylated SMAD2/3 staining (e.g. Cell Signaling antibody) in the two models will be informative to show where TGF signaling activity is altered.

      We have not been successful in using SMAD2/3 staining to infer changes in TGFb signaling at the resolution needed to address this question. Other groups have shown qPCR and western blot data for SMAD2/3 signaling from whole lung extracts, but these approaches lack cell type and specificity and do not address spatial changes. We attempted to incorporate pSMAD2/3 staining into our flow cytometry experiments, but the staining protocol did not work in our hands.

      (4) Is cell death increased in the multiple models that showed simplification?

      While our EdU experiments address proliferation, we were unable to perform PDGFRa and TUNEL/caspase co-staining by histology to address apoptosis/cell death in our different models. Shown here is data from P7 wildtype mice in which Cdkn1a (promoting arrest of cell cycle), and pro-apoptotic genes Bax, Bak1, and Fas are all upregulated in hyperoxia in several mesenchymal cell populations including myofibroblasts.

      Author response image 9.

      (5) Wording: "These data suggest that avb6 does not play a role in TGFb activation during normal development or neonatal hyperoxia, while av-integrins in the lung mesenchyme are required for normal development and play a protective role in response to hyperoxia." The first half of the sentence is missing a reference to the epithelium.

      Text now reads "These data suggest that epithelial avb6 does not play a role…”

      Reviewer #2 (Recommendations For The Authors):

      The reviewer greatly appreciates the work presented here, especially the hard task of addressing combined signaling pathway input into key mesenchymal cell types during an essential expansion of alveolar surface area in postnatal lung and its effect upon disturbance.

      The issues of concern are mentioned in the public review and are expanded upon below:

      (1) Expanded characterization of PDGFRa+ expressing cells in the scRNA dataset is needed (see public review). Also included should be some of the key myofibroblast genes (elastin, Acta2, etc.) and their changes in the relevant cell populations. It would be important to show (at least at the transcriptional level) that myofibroblast differentiation is impaired if the author claims that the alveolarization defect is due to functional myofibroblast impairment. Furthermore, Ect2 expression and changes with treatments should be shown for the different cell populations (relevant to Figure 9).

      See responses above

      (2) The authors stated that they did not find evidence of fibrosis, scarring, and inflammation, but did not provide data to support this statement. Given the importance of at least the inflammation component in BPD, the absence of inflammation needs to be shown, especially in the model using the TGFBR2-cKO mouse, where at least their data show a trend to increased CD45 cell numbers (Figure 2), and upregulated inflammatory upstream regulators (IL10, IFNa, IKBKB, CEBPB upregulated) in the IPA (Figure 3). BAL and/or tissue by flow or IHC have been used to assess different immune cell populations. In terms of evaluation of vascular impairment, the single-cell data set contains endothelial cells, vascular smooth muscle, and pericytes, which allows interrogation following the two different types of injury (hyperoxia cKO TGFbR2) used for the scRNA-seq experiments).

      A full characterization of the immune cell or vascular/endothelial cell compartment within our models is beyond the scope of this current study as we were focusing on the shared changes observed within the lung mesenchyme. None of these compartments exist in isolation, so of course there are likely to be correlative and/or causative changes observed in each of the different models which we studied. We did consider further phenotypic analysis of the immune cells by flow cytometry within our different models, but deferred these experiments for future studies. As mentioned earlier we have omitted the reference to “no inflammation”.

      (3) The authors should report several litters per experiment and experimental group, mortality in the groups, and if present, visualize using e.g. Caplan-Meyer curves. The switch of the mothers during treatment, the early postnatal injections and treatments, and variability in outcome measures between different litters have to be anticipated. Therefore at least 2 litters, but preferably 3 litters per experiment should be examined, to show reproducibility.

      All experiments were conducted with at least 2-3 contemporaneous litters in each treatment group as this was necessary to have enough animals per treatment condition/group to achieve statistical significance. This was essential as all experiments were conducted on the C57BL/6 background where litter sizes are typically 6-8 pups in our colony. We did not encounter any maternal mortality related to hyperoxia exposure while rotating between hyperoxia and normoxia every 48 hrs. Loss of pups in our experiments was mostly due to cannibalism either immediately after birth or from neglect due to failure of cross-fostering.

      (4) The reviewer is concerned about using PBS as a control for experiments involving antibody treatment, in this case, 1D 11. The use of an isotype IgG would be the most appropriate and convincing control. In this case, an isotype-matched murine IgG1 control (13C4) has already been generated and is commercially available. While the reviewer does not suggest repeating all experiments, at least one small experiment showing that control IgG does not alter the lung phenotype with hyperoxia when compared with 1D11 would be important.

      We appreciate the reviewer’s suggestion and will consider an isotype antibody comparison in future studies. While not directly comparing 1D11 to isotype, we can share data in which we compared PBS to a different antibody. In this experiment, we attempted to use antibody blockade during the first 10 days of life while mice were undergoing hyperoxia treatment to target a specific component of the TGFb pathway. We observed no difference in outcomes either in RA or O2 when comparing PBS to xxx antibody. We cannot share the antibody identity due to intellectual property reasons, however additional studies confirmed that this antibody likely had no impact due to poor in vivo blocking activity.

      Author response image 10.

      (5) While inhibited proliferation is one possible explanation for the decrease of PDGFRa expression in the injured mice, there should be consideration of increased and/or premature apoptosis (before the physiologically observed wave P14-P20) as another reason. Also, do the authors propose that only proliferation results in alveolarization impairment, but differentiation plays no significant role here? If that is the case that would mean that there are some fully-differentiated myofibroblasts in the alveolar septa, but not enough to create the multitude of alveolar septal walls. Have the authors evaluated the decrease in secondary alveolar septa formed per alveolar airspace? This measure would give some sense of whether septum initiation was prevented or whether septa were formed, but are structurally abnormal, e.g. due to altered ECM (suspected decrease in Elastin and SMA expression, if myofibroblast differentiation was impaired or cell content (suspected decrease in myofibroblasts and increase of other cell types, such as lipofibroblasts).

      Apoptosis/cell death are likely to play a role in addition to inhibited proliferation. See violin plots shown above with cell cycle arrest and pro-apoptotic genes upregulated within the mesenchyme. Because we were unable to optimize tissue sections/staining with the samples collected during the early time points of our experiments (ie P4, P7, P10, P14), we are unable to co-stain for markers of apoptosis and answer this question in a direct manner. Future experiments will focus on additional characterization of these early changes with particular attention to altered fibroblast phenotypes within the alveolar septae.

      (6) An illustration depicting key cells and the pathways involved in cartoon format would be a useful addition and visualize the important conclusions of this paper for the reader.

      We appreciate this suggestion but think the results are sufficiently straightforward that a summary cartoon would not add much.

      Figure 4A: the legend appears to be switched. The gray square seems to align with the epithelial ligands, while the blue square aligns with receptors.

      Thank you for identifying this mistake – fixed.

      Names of transgenic lines used through manuscript:

      Please use the correct name, as per JAX would be either Gli1tm3(cre/ERT2)Alj/J or Gli1-CreERT2.

      Please use the correct name, as per JAX would be either Pdgfratm1.1(cre/ERT2)Blh/J or Pdgfrα-CreERT2.

      PDGFRa-CRE would be JAX# 013148.

      The transgenic lines have been noted in the methods, and we have edited the text of the manuscript to reflect the correct names of these lines. For the supplementary figure 4 which compares Gli1-CreERT2 to Pdgfrα-CreERT2, we left our prior nomenclature intact because it better reflects that each of these lines are haploinsufficient at their targeted loci, and that the controls are cre-negative littermates.

      We did not use the PDGFRa-CRE line (JAX# 013148).

      Reviewer #3 (Recommendations For The Authors):

      - More transparency about the single-cell analysis is required: 1) how are cell types and clusters defined? 2) what strategy was used for ambient RNA? 3) how do the controls compare with recently published mouse developmental datasets? 4) how does this model compare with the single-cell dataset published by Thibeault et al in 2021 (neonatal hyperoxia x 14 days with multiple time points used)?

      See responses above.

      - Tissue level validation of these findings is essential by RNA ISH or IF. While validation that the same process is at play in human tissue would be ideal, if this is not available, the conclusions must be tempered in the discussion.

      See responses above.

      - Is this more mild neonatal injury reversible in mice? As noted above, more characterization of this model (and placing it in the context of other more widely published models would be helpful).

      See responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment:

      This important study reveals that the malaria parasite protein PfHO, though lacking typical heme oxygenase activity, is vital for the survival of Plasmodium falciparum. Structural and localization analyses showed that PfHO is essential for apicoplast maintenance, particularly in gene expression and biogenesis, indicating a novel adaptive role for this protein in parasite biology. While the results supporting the claims of the authors are convincing, the lack of data defining a molecular understanding or mechanism of action of the protein in question limits the impact of the study. 

      We appreciate the positive assessment. We agree that further mechanistic understanding of PfHO function remains a key future challenge. Indeed, we made extensive efforts to unravel the molecular interactions and mechanisms that underpin the critical function of PfHO. We elucidated key interactions between PfHO and the apicoplast genome, reliance of these interactions on the electropositive N-terminus, association of PfHO with DNA-binding proteins, and a specific defect in apicoplast mRNA levels upon PfHO knockdown. The major limitation we faced in further defining PfHO function is the general lack of understanding of apicoplast transcription and broader gene expression in this organelle. That limitation and the challenges to overcome it go well beyond our study and will require concerted efforts across several manuscripts (likely by multiple groups) to define the mechanistic features of apicoplast gene expression. We look forward to contributing further molecular understanding of PfHO function as broader understanding of apicoplast transcription emerges.

      Public Reviews:

      Reviewer #1 (Public Review):

      Malaria parasites detoxify free heme molecules released from digested host hemoglobins by biomineralizing them into inert hemozoin. Thus, why malaria parasites retain PfHO, a dead enzyme that loses the capacity of catabolizing heme, is an outstanding question that has puzzled researchers for more than a decade. In the current manuscript, the authors addressed this question by first solving the crystal structure of PfHO and aligning it with structures of other heme oxygenase (HO) proteins. They found that the N-terminal 95 residues of PfHO, which failed to crystalize due to their disordered nature, may serve as signal and transit peptides for PfHO subcellular localization. This was confirmed by subsequent microscopic analysis with episomally expressed PfHO-GFP and a GFP reporter fused to the first 83 residues of PfHO (PfHO N-term-GFP). To investigate the functional importance of PfHO, the authors generated an anhydrotetracycline (aTC) controlled PfHO knockdown strain. Strikingly, the parasites lacking PfHO failed to grow and lost their apicoplast. Finally, by chromatin immunoprecipitation (ChIP), quantitative PCR/RT-PCR, and growth assays, the authors showed that both the cognate N-terminus and HO-like domain were required for PfHO function as an apicoplast DNA interacting protein.

      The authors systemically performed multidisciplinary approaches to address this difficult question: what is the function of this enzymatically dead PfHO? I enjoyed reading this manuscript and its thoughtful discussion. This study is not of clinical importance for antimalarial treatments but also deepens our understanding of protein function evolution. While I understand these experiments are challenging to conduct in malaria parasites, the data quality of some of the experiments could be improved. For example, most of the Western blots and Southern blots are not of high quality. 

      We thank the reviewer for the positive comments but are a bit puzzled by the final statement about western and Southern blot quality. We agree that the two anti-PfHO western blots probed with custom antibody (Fig. 3- source data 2 and 8) have substantial background signal in the higher molecular mass region >75 kDa. However, we note that the critical region <50 kDa is clear in both cases and readily enables target band visualization. All other western blots probing GFP or HA epitopes are of high quality with minimal off-target background. We present two Southern blot images. We agree that the signal is somewhat faint for the Southern blot demonstrating on-target integration of the aptamer/TetR-DOZI plasmid (Fig. 3- fig. supplement 4), although we note that the correct band pattern for integration is visible. We also note that the accompanying genomic PCR data is unambiguous. The Southern blot for GFPDHFRDD incorporation into the PfHO locus (Fig. 3- fig. supplement 1) has clear signal and strongly supports on-target integration. The minor background signal in the lower left region of the image does not extend into the critical lanes nor impact interpretation of correct clonal integration.

      As noted below, we have obtained a second western blot image to evaluate the decrease in PfHO protein expression in -aTC conditions. This revised image, which we now include in Fig. 3, shows clean detection of the PfHO signal in the critical molecular mass region below 40 kDa in +aTC conditions and substantial loss of this signal in -aTC conditions (relative to HSP60 loading control).

      Reviewer #2 (Public Review):

      Summary: 

      Blackwell et al. investigated the structure, localization, and physiological function of Plasmodium falciparum (Pf) heme oxygenase (HO). Pf and other malaria parasites scavenge and digest large amounts of hemoglobin from red cells for sustenance. To counter the potentially cytotoxic effects of heme, it is biomineralized into hemozoin and stored in the food vacuole. Another mechanism to counteract heme toxicity is through its enzymatic degradation via heme oxygenases. However, it was previously found by the authors that PfHO lacks the ability to catalyze heme degradation, raising the intriguing question of what the physiological function of PfHO is. In the current contribution, the authors determine that PfHO localizes to the apicoplast, determine its targeting sequence, establish the essentiality of PfHO for parasite viability, and determine that PfHO is required for proper maintenance of apicoplasts and apicoplast gene expression. In sum, the authors establish an essential physiological function for PfHO, thereby providing new insights into the role of PfHO in plasmodium metabolism. 

      Strengths: 

      The studies are rigorously conducted and the results of the experiments unambiguously support a role for PfHO as being an apicoplast-targeted protein required for parasite viability and maintenance of apicoplasts. 

      Weaknesses: 

      While the studies conducted are rigorous and support the primary conclusions, the lack of experiments probing the molecular function of PfHO limits the impact of the work. Nevertheless, the knowledge that PfHO is required for parasite viability and plays a role in the maintenance of apicoplasts is still an important advance.

      We appreciate the positive assessment. We agree that further mechanistic understanding of PfHO function remains a key future challenge. Indeed, we made extensive efforts to unravel the molecular interactions and mechanisms that underpin the critical function of PfHO. We elucidated key interactions between PfHO and the apicoplast genome, reliance of these interactions on the electropositive N-terminus, association of PfHO with DNA-binding proteins, and a specific defect in apicoplast mRNA levels upon PfHO knockdown. The major limitation we faced in further defining PfHO function is the general lack of understanding of apicoplast transcription and broader gene expression. That limitation and the challenges to overcome it go well beyond our study and will require concerted efforts across several manuscripts (likely by multiple groups) to define the mechanistic features of apicoplast gene expression. We look forward to contributing further molecular understanding of PfHO function as broader understanding of apicoplast transcription emerges.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      Specifically, I would like to see the expression of PfHO in the 3D7 strain and PfHOaptamer/TetR-DOZI parasites detected by PfHO antibody on the same blot. The reason is that while most of the western blots show that PfHO appears as both pro- and processed-form, Figure 3-S5B shows only the processed-form of PfHO in all life stages of 3D7. It would be interesting to find out if the processing of PfHO1 is strain/stage-specific, and whether it is regulated by heme levels. It may also be interesting to find out if the pro-form of PfHO is also functional (i.e. mutate the cleavage site). 

      We agree with the reviewer that Fig. 3- figure supplement 5B shows predominant detection of a single band for PfHO in untagged 3D7 parasites. In our experience, the detection of the unprocessed, pro form of PfHO can vary idiosyncratically with different experiments and cultures. In support of this variable detection of unprocessed PfHO in 3D7, we note in Fig. 3A that we detected both the unprocessed and processed forms of PfHO in a western blot of endogenously tagged PfHO-GFP-DHFRDD in 3D7 parasites with an intact apicoplast. We agree with the reviewer that future studies of stage-dependent processing of PfHO may give insights into conditions that favor or disfavor detection of the unprocessed protein. 

      Given prior evidence for vestigial heme binding by PfHO (Sigala et al. JBC 2012), we considered whether such heme binding might modulate PfHO expression, stability, and/or function. It is unknown if heme is present inside the apicoplast, and we currently lack evidence for heme-dependent function or expression by PfHO. Future studies can test this possible dependence.

      Regarding processing and possible function of the cleaved peptide, we note that the Nterminal 18 amino acids are expected to constitute the signal peptide that is cleaved cotranslationally with import into the ER. Our data indicate that PfHO undergoes further processing upon import into the apicoplast to remove a further 15 residues. We currently have no evidence nor expectation that these additional residues contribute to PfHO function beyond targeting to the apicoplast.

      I am also confused as to why the authors used rabbit anti-PfHO and rabbit anti-Ef1α on the same blot for Figure 3C, which makes it difficult to appreciate the expression changes of PfHO. Given the high non-specific background of PfHO antibody shown by other Western blots (Figure 3 - Source data 2), I would like to see a blot stained with only PfHO antibody to show that expression of PfHO has been efficiently reduced in the absence of aTC. 

      Bands for Ef1α (50 kDa) and untagged PfHO (~32 kDa) are readily distinguished by western blot analysis based on their distinct molecular masses and electrophoretic mobilities. We agree that staining with the anti-PfHO antibody resulted in background bands in other regions of the gel image, especially in the higher molecular mass region >75 kDa. We note that additional strong evidence for down-regulation of PfHO expression is provided in Fig. 3- figure supplement 6, which shows specific loss of PfHO mRNA transcript levels in -aTC conditions by RT-qPCR. 

      Nevertheless, we have followed the reviewer’s suggestion and provided a new WB image of PfHO expression ±aTC (probed only with rabbit anti-PfHO antibody) that shows strong down-regulation of PfHO protein levels in -aTC conditions, consistent with the strong growth phenotype observed. We have inserted this revised, cleaner western blot image into Fig. 3 (along with detection of HSP60 levels in replicate samples as loading control) and placed the prior image into Fig. 3- figure supplement 6. In both cases, densitometry analysis indicates an 80-85% reduction in PfHO levels in -aTC conditions.

      The authors proposed that PfHO interacts with apicoplast genome DNA via the electropositive N-terminus. Interestingly, these positively charged residues are not conserved between Plasmodium, Theileria, and Babesia. I will be curious to follow the authors' future work to investigate the function of this electropositive N-terminus, possibly by comparative and mutagenesis analysis. 

      We agree that further molecular studies of DNA-binding determinants by PfHO and its N-terminus will be insightful.

      The Quantitative RT-PCR analysis revealed that loss of PfHO specifically resulted in decreased apicoplast RNA. I wonder if the authors plan to conduct RNAseq analysis on the PfHO knockdown strain across multiple life stages, to get a clearer picture of PfHO function in malaria parasites. 

      Our RT-qPCR data across multiple asexual stages prior to organelle loss indicate that abundance of all apicoplast-encoded transcripts drops precipitously and uniformly upon PfHO knockdown (Fig. 5- figure supplement 7). Given the small size of the apicoplast genome and the polycistronic nature of apicoplast transcription, we assume that RNA-Seq studies would result in a similar observation. We hypothesize that PfHO knockdown and subsequent dysfunctions may interfere with RNA polymerase assembly on DNA and/or processivity. We are currently testing these hypotheses.

      I noticed that the authors did not discuss the function of PfHO in apicoplast organelle biogenesis. Since ClpM (previously termed ClpC) is the only apicoplast-encoded Clp subunit that is essential for apicoplast biogenesis, does the author think that PfHO knockdown parasites lost their apicoplast due to decreased ClpM expression? If that were the case, would episomally expression or nuclear knockin of ClpM rescue PfHO deficiency in the absence of isopentenyl pyrophosphate (IPP)? 

      We share the reviewer’s curiosity to understand how loss of apicoplast transcripts leads to organelle dysfunction and defective IPP synthesis. We agree that ClpM function may be critical to import of nuclear-encoded proteins necessary for apicoplast function. SufB encoded on the apicoplast genome is also expected to be essential for Fe-S cluster synthesis in the apicoplast and to be required for Fe-S-dependent IPP synthesis. We have expanded the first Discussion section to address these possible connections.

      Minor: 

      (1) None of the microscopy photos have scale bars. 

      We have added scale bars to all microscopy images.

      (2) Multiple microscopy pictures show strange patches around the fluorescent signals (a grey square distinguishes from the black background). This is especially evident in Figure 2 S2. Was it caused by the reduction of the original pictures? 

      We have reviewed all fluorescence microscopy images but are unable to identify the issue noted by the reviewer. We have uploaded new versions of all images to include scale bars (as requested above), and we hope that this update resolves the issue observed by the reviewer. We are happy to further troubleshoot and address if the reviewer continues to see these artifacts and can provide further information.

      (3) A description of how Southern blotting was performed is missing. 

      We thank the reviewer for bringing this omission to our attention. We have added a description of the Southern blot methods to the section on genome editing.

      (4) Figure 3B: should be "αGFP: 12nm", not "αPfHO1: 12nm". 

      We have modified this labeling to read “αGFP (PfHO): 12 nm”.

      (5) Figure 3C: which clone of PfHO knockdown was used in all the following figures? How many clones were tested in the following figures (did they show consistent phenotype)? 

      The polyclonal culture of PfHO-aptamer/TetR-DOZI knockdown parasites from transfection 11 was used for growth assay and western blot experiments, since there was no evidence by PCR or Southern blot for the wildtype PfHO locus. We have elaborated on these details in the Methods section.

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 2 and Figure 3B, to address rigor and reproducibility, the authors should state the number of parasites analyzed and if there was any variation in localization. For instance, did all of the parasites analyzed have apicoplast localization of heme oxygenase or was there a distribution of apicoplast and non-apicoplast localization? 

      Localization by fluorescence microscopy of episomal and endogenous tagged PfHO is presented in Fig. 2, Fig. 2- fig. supplements 1 and 2, and Fig. 3- fig. supplement 2. Localization by immunogold EM is presented in Fig. 3B and Fig. 3- fig. supplement 3. In all cases 3-4 representative images are presented that support exclusive localization of PfHO to the apicoplast. We imaged ≥10-20 additional parasites in all cases (and across distinct transfections and biological samples) that also supported exclusive localization to the apicoplast. We have modified the figure legends and methods description to note these replicate values. Finally, we note that IPP rescue of parasite viability upon PfHO knockdown strongly supports the conclusion that the critical and essential function of PfHO impacts the apicoplast, consistent with its exclusive detection in that organelle by microscopy.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment 1. Mohseni and Elhaik's article offers a critical evaluation of Geometric Morphometrics (GM), a common tool in physical anthropology for studying morphological differences and making phylogenetic inferences. I read their article with great interest, although I am not a geneticist or an expert on PCA theory since the problem of morphology-based classification is at the core of paleoanthropology.

      The authors developed a Python package for processing superimposed landmark data with classifier and outlier detection methods, to evaluate the adequacy of the standard approach to shape analysis via modern GM. They call into question the accuracy, robustness, and reproducibility of GM, and demonstrate how PCA introduces statistical artefacts specific to the data, thus challenging its scientific rigor. The authors demonstrate the superiority of machine learning methods in classification and outlier detection tasks. The paper is well-written and provides strong evidence in support of the authors' argument. Thus, in my opinion, it constitutes a major contribution to the field of physical anthropology, as it provides a critical and necessary evaluation of what has become a basic tool for studying morphology, and of the assumptions allowing its application for phylogenetic inferences. Again, I am not an expert in these statistical methods, nor a geneticist, but the authors' contribution is of substantial relevance to our field (physical anthropology). The examples of NR fossils and HLD 6 are cases in point, in line with other notable examples of critical assessment of phylogenetic inferences made on the basis of PCA results of GM analysis. For example, see Lordkipanidze et al.'s (2014) GM analyses of the Dmanisi fossils, suggesting that the five crania represent a single regional variant of Homo erectus; and see Schwartz et al.'s (2014) comment on their findings, claiming that the dental, mandibular, and cranial morphology of these fossils suggest taxic diversity. Schwartz et al. (2014) ask, "Why did the GMA of 78 landmarks not capture the visually obvious differences between the Dmanisi crania and specimens commonly subsumed H. erectus? ... one wonders how phylogenetically reliable a method can be that does not reflect even easily visible gross morphological differences" (p. 360).

      As an alternative to the PCA step in GM, the authors tested eight leading supervised learning classifiers and outlier detection methods on three-dimensional datasets. The authors demonstrated inconsistency of PCA clustering with the taxonomy of the species investigated for the reconstruction of their phylogeny, by analyzing a database comprising landmarks of 6 known species that belong to the Old World monkeys tribe Papionini, using PCA for classification. The authors also demonstrated that high explained variance should not be used as an estimate of high accuracy (reliability). Then, the authors altered the dataset in several ways to simulate the characteristic nature of paleontological data.

      The authors excluded taxa from the database to study how PCA and alternative classifiers are affected by partial sampling, and the results presented in Figures 4 and 5, among others, are quite remarkable in showing the deviations from the benchmark data. These results expose the perils of applying PCA and GM for interpreting morphological data. Furthermore, they provide evidence showing that the alternative classifiers are superior to PCA, and that they are less susceptible to experimenter intervention. Similar results, i.e., inconsistencies in the PC plots, were obtained in examinations of the effect of removing specimens from the dataset and in the interesting test of removing landmarks to simulate partial morphological data, as is often the case with fossils. To test the combined effect of these data alterations, the authors combined removal of taxa, specific samples, and landmarks from the dataset. In this case, as well, the PCA results indicate deviation from the benchmark data. However, the ML classifiers could not remedy the situation. The authors discuss how these inconsistencies may lead to different interpretations of the data, and in turn, different phylogenetic conclusions. Lastly, the authors simulated the situation of a specimen of unknown taxonomy using outlier detection methods, demonstrating LOF's ability to identify a novelty in the morphospace.

      References

      Bookstein FL. 1991. Morphometric tools for landmark data: geometry and biology [Orange book]. Cambridge New York: Cambridge University Press.<br /> Cooke SB, and Terhune CE. 2015. Form, function, and geometric morphometrics. The Anatomical Records 298:5-28.<br /> Lordkipanidze D, et al. 2013. A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo. Science 342: 326-331.<br /> Schwartz JH, Tattersall I, and Chi Z. 2014. Comment on "A complete skull from Dmanisi, Georgia, and the evolutionary biology of Early Homo". Science 344(6182): 360-a.

      The reviewer considered our work to be a “contribution is of substantial relevance to our field (physical anthropology)” We are grateful for this evaluation and for the thorough review and insightful comments on our manuscript, which helped us improve its quality further. Your remarks regarding the superiority of machine learning methods over traditional GM approaches, as well as the challenges and implications highlighted in our findings, resonate deeply with the core objectives of our research. The references to previous studies and their relevance to our work underscore the broader implications of our findings for the interpretation of morphological data in evolutionary studies. We are thankful for your remarks regarding the debate surrounding the Dmanisi fossils. We covered it in our introduction (lines 161-174):

      Finally, PCA also played a part in the much-disputed case of the Dmanisi hominins (39, 40). These early Pleistocene hominins, whose fossils were recovered at Dmanisi (Georgia), have been a subject of intense study and debate within physical anthropology. Despite their small brain size and primitive skeletal architecture, the Dmanisi fossils represent Eurasia’s earliest well-dated hominin fossils, offering insights into early hominin migrations out of Africa. The taxonomic status of the Dmanisi hominins has been initially classified as Homo erectus or potentially represented a new species, Homo georgicus or else (40, 41). Lordkipanidze et al.’s (42) geometric morphometrics analyses suggested that the variation observed among the Dmanisi skulls may represent a single regional variant of Homo erectus. However, Schwartz et al. (2014) (43) raised concerns about the phylogenetic inferences based on PCA results of the geometric morphometrics analysis, noting the failure of the method to capture visually obvious differences between the Dmanisi crania and specimens commonly subsumed under Homo erectus."

      Comment 2. I suggest moving all the interpretations from the Results section to the Discussion section. This will enhance the flow of the results and make it easier to follow.

      We tried that, but it made the manuscript less readable. Because our manuscript makes two strong statements, one about the unsuitability of PCA to the field and one about the many other problems in the field, as demonstrated through several test cases, it is better to keep them separate in the Results and Discussions, respectively.

      Comment 3. I recommend conducting an English language edit on the text to address minor inconsistencies.

      We thoroughly edited the text to enhance the language style and consistency. We thank the reviewer for the suggestion.

      Comment 4. Line 21, what do you mean by "ontogenists"?

      Individuals who are versed in or study ontogeny.

      Comment 5. When referring to the remains from Nesher Ramla (Israel), I recommend using "NR fossils". Thus, in line 34, I suggest replacing "Homo Nesher Ramla" by "Nesher Ramla fossils (NR fossils)", also in line 122.

      We replaced "Homo Nesher Ramla" with "Nesher Ramla fossils (NR fossils)" in all of the instances throughout the manuscript. We thank the reviewer for the suggestion.

      Comment 6. Line 34, I suggest replacing "human" by "hominin".

      (Line 35) We replaced "human" with "hominin".

      “…, such as the case of Homo Nesher Ramla, an archaic hominin with a questionable taxonomy.”

      We thank the reviewer for the suggestion.

      Comment 7. Line 67-68, I suggest clarifying the classification of landmarks using the definition of landmark types (Bookstein, 1991; also see summary by Cooke and Terhune (2015) - Table 1).

      We revised our summary of the classification of landmarks: (Lines 83-94). Our MS now reads:

      “Determining sufficient measurements and data points for a valid morphometric analysis is older than modern geometric morphometrics (19). In geometric morphometrics, landmarks are discrete points on biological structures used to capture shape variation. Bookstein (20) categorised landmarks into three types: Type one, representing the juxtaposition of tissues such as the intersection of two sutures; Type two, denoting maxima of curvature like the deepest point in a depression or the most projecting point on a process; and Type three, which includes extremal points defined by information from other locations on the object, such as the endpoint or centroid of a curve or feature. Originally, Type three landmarks encompassed semi-landmarks, but Weber and Bookstein (21) refined this classification, identifying Type three landmarks as those characterised by information from multiple curves and symmetry, including the intersection of two curves or the intersection of a curve and a suture, and further subdividing them into three subtypes (3a, 3b, 3c) (15). While landmarks provide crucial information about the structure’s overall shape, semi-landmarks capture fine-scale shape variation (e.g., curves or surfaces) that landmarks alone cannot adequately represent. Semi-landmarks are heavily relied upon as the source of shape information to break the continuity of regions in the specimen without clearly identifiable landmarks (22). Semi-landmarks are typically aligned based on their relative positions to landmarks, allowing for the comprehensive analysis of shape changes and deformations within complex structures (2). Unsurprisingly, the use of semi-landmarks is controversial. For instance, Bardua et al. (23) claim that high-density sliding semi-landmark approaches offer advantages compared to landmark-only studies, while Cardini (24) advises caution about potential biases and subsequent inaccuracies in high-density morphometric analyses.”

      We thank the reviewer for the suggestion.

      Comment 8. Line 84, "beneficial over" - I suggest revising.

      (Line 102) We revised the sentence and used “offer advantages” instead.

      “… claim that high-density sliding semi-landmark approaches offer advantages compared to landmark-only studies.”

      We thank the reviewer for the suggestion.

      Comment 9. Line 97, do you mean "therefore"?

      (Line 115) Yes, we replaced "thereby" with "therefore".

      Comment 10. Line 116, I suggest rephrasing as follows: "newly discovered hominin fossils with respect to...".

      (Lines 135, 136) We rephrased it as suggested:

      “is the classification of newly discovered hominin fossils within the human phylogenetic tree”

      We thank the reviewer for the suggestion.

      Comment 11. Line 119, please clarify or explain what you mean by subjective determination of clustering in PCA plots.

      We rephrased (Lines 137, 138) to read:

      "However, which specimens should be included in clusters and which ones should be considered outliers is determined subjectively…"

      We thank the reviewer for the suggestion.

      Comment 12. Lines 146-148: consider revising to clarify the sentence; "than" in line 147 should be "that".

      We modified the sentence, we replaced "than" with "that". (Lines 196, 197)

      " … that even the criticism from its pioneers was dismissed"

      We thank the reviewer for the suggestion.

      Comment 13. Line 213: I recommend adding the phylogenetic tree of the Papionini tribe. This would be particularly relevant for the interpretation of the results, e.g., in lines 324-328.

      The reviewer suggested adding a phylogenetic tree of the Papionini tribe to increase the interpretability of our results. We added two trees (Figure 3) based on the molecular phylogeny of extant papionins and the most parsimonious tree generated from the initial Collard and Wood (1).

      We thank the reviewer for the suggestion.

      Comment 14. Lines 244-248: I recommend that the parallels drawn between the results presented in this section and other cases of PCA analysis interpretation (e.g., the NR fossils) are transferred to the Discussion section.

      This would allow a more fluent read of the results.

      Thank you, we considered that but found that it does not improve the readability of the discussion, because this is a very technical issue that would be best understood alongside the specific use case that tests it.

      Comment 15. Line 301: The word "are" should be placed before the word "all".

      (Line 319) We modified accordingly and placed "are" before "all":

      “Rarely are all related taxa represented;”

      We thank the reviewer for the suggestion.

      Comment 16. Line 426: I suggest "omissions" in place of "missingness".

      (Line 435) We replaced "missingness" with "omissions".

      We thank the reviewer for the suggestion.

      Comment 17. Line 440 is part of the caption for Figure 6. Please add a description of what the red arrow indicates in every figure in which it appears.

      Yes, we added a sentence to the caption of figures 7 and 8:

      “The red arrow in subfigures A, B, and C marks a Lophocebus albigena (pink) sample whose position in PC scatterplots is of interest.”

      We thank the reviewer for the suggestion.

      Comment 18. Line 454: I recommend "partial morphological information" instead of "some form information".

      (Lines 446, 447) We made modifications and replaced "some form information" with " partial morphological information":

      “Newfound samples often comprise incomplete osteological remains or fossils (18, 22) and only present partial morphological information.”

      We thank the reviewer for the suggestion.

      Comment 19. Line 547: I suggest "portion" instead of "fracture".

      (Lines 470, 471) We replaced "fracture" with "portion":

      “Thereby, while the complete skull would cluster with its own taxon…”

      We thank the reviewer for the suggestion.

      Comment 20. Lines 664-665 should read "anatomy and physical anthropology".

      (Lines 600-602) We modified the text accordingly:

      “There are various approaches in morphometrics, but among them, geometric morphometrics has left an indelible mark on biology, especially in anatomy and physical anthropology.”

      We thank the reviewer for the suggestion.

      Comment 21. Lines 684-699: This paragraph seems to belong in the introduction section.

      (lines 175-190) We modified it and moved it to the introduction.

      “Visual interpretations of the PC scatterplots are not the only role PCA plays in geometric morphometrics. Phylogenetic Principal Component Analysis (Phy-PCA) (44) and Phylogenetically Aligned Component Analysis (PACA) (45) are both used in geometric morphometrics to analyse shape variation while considering the supposed phylogenetic relationships among species. They differ in their approach to aligning landmark configurations and the role of PCA within them. Phy-PCA incorporates phylogenetic information by utilising a phylogenetic tree to model the evolutionary history of the species. This method aims to separate shape variation resulting from shared evolutionary history from other sources of variation. PCA plays a similar role in performing dimensionality reduction on the aligned landmark configurations in Phy-PCA (44). PACA takes a different approach to alignment. It uses a Procrustes superimposition method based on a phylogenetic distance matrix, aligning the landmark configurations according to the evolutionary relationships among species. PCA is then applied to the aligned configurations to extract the principal components of shape variation (45). Both analyses provide insights into the patterns and processes that shape biological form diversity while considering phylogenetic relationships, yet they are also subjected to the limitations and biases inherent in relying on PCA as part of the process.”

      We thank the reviewer for the suggestion.

      Comment 22. Line 717: I suggest "fossils" instead of "hominins".

      (Lines 636, 637) We modified it accordingly and replaced "hominins" with "fossils":

      “…which reflect the restraints faced in morphometric analysis of ancient samples (e.g., fossils).”

      We thank the reviewer for the suggestion.

      Comment 23. Line 728: the word "the" should be deleted; Skhul V should not be italicized, and so do the words "Mount Carmel"; "Neandertals"; "modern humans"; and "Late Paleolithic" in the following lines.

      (Line 647-651) We made modifications accordingly:

      “For example, Harvati (27), who analysed the Skhul 5 (84), a 40,000-year-old human skull from Mount Carmel (Israel), proposed diverging hypotheses based on favourable PC outcomes (based on PC8 separating it from Neanderthals and modern humans and associating it with the Late Palaeolithic specimen and based on PC12 associating it with modern humans).”

      We thank the reviewer for the suggestion.

      Comment 24. Line 734: the first comma should be deleted.

      (Line 653) We deleted the first comma:

      “(Figures 5-12) show that compared to the benchmark (Figure 4), …”

      We thank the reviewer for the suggestion.

      Reviewer #2:

      Comment 1. I completely agree with the basic thrust of this study. Yes, of course, machine learning is FAR better than any variant of PCA for the paleosciences. I agree with the authors' critique early on that this point is not new per se - it is familiar to most of the founders of the field of GMM, including this reviewer. A crucial aspect is the dependence of ALL of GMM, PCA or otherwise, on the completely unexamined, unformalized praxis by which a landmark configuration is designed in the first place. I must admit that I am stunned by the authors' estimate of over 32K papers that have used PCA with GMM.

      We thank the reviewer for accepting the premise of our study.

      But beating a dead horse is not a good way of designing a motor vehicle. I think the manuscript needs to begin with a higher-level view of the pathology of its target disciplines, paleontology and paleoanthropology, along the lines that David demonstrated for numerical taxonomy some decades ago. That many thousands of bad methodologies require some sort of explanation all of their own in terms of (a) the fears of biologists about advanced mathematics, (b) the need for publications and tenure, (c) the desirability of covers of Nature and Science, and (d) the even greater glory of getting to name a new "species." This cumulative pathology of science results in paleoanthro turning into a branch of the humanities, where no single conclusion is treated as stable beyond the next dig, the next year or so of applied genomics, and the next chemical trace analysis. In short, the field is not cumulative.

      Given the wide popularity of PCA and the attempts to prevent data replication to show its limitations, we do not believe that we are beating a dead horse, but a very live beast that threatens the integrity of the entire field. We accept the second part of the analogy about developing a motor vehicle.

      We also accepted the reviewer’s suggestion and developed the suggested paragraph:

      " A major contribution to the field was made by Sokal and Sneath’s Principles of Numerical Taxonomy (9) book, which challenged traditional taxonomic theory as inherently circular and introduced quantitative methods to address questions of classification (see also review by Sneath (10)). Hull (11) claimed that evolutionary reasoning practiced in taxonomy is not inherently circular but rather unwarranted. He argued that such criticism was based on misunderstandings of the logic of hypothesising, which he attributed to an unrealistic desire for a mistake-proof science. He contended that scientific hypotheses should begin with insufficient evidence and be refined iteratively as new evidence emerges. However, some taxonomists preferred a more rigid, hierarchical approach to avoid the appearance of error. As a result of these and other criticisms, traditional taxonomy declined in favour of cladistics and molecular systematics, which provided more accurate and evolutionarily informed classifications.

      Today, palaeontology and palaeoanthropology grapple with methodological challenges that compromise the stability of their conclusions. These issues stem from various factors, including biologists’ apprehensions towards advanced mathematics, the pressure to publish for career advancement (12), the pursuit of high-profile journal covers, and the prestige associated with naming new species. As a result, these fields often resemble a branch of biology where the latest discoveries or new analytical techniques frequently overturn previous findings. This lack of cumulative knowledge necessitates a more rigorous approach to methodology and interpretation in morphometrics to ensure that conclusions are robust and enduring."

      It is not obvious that the authors' suggestion of supervised machine learning will remedy this situation, since (a) that field itself is undergoing massive changes month by month with the advent of applications AI, and even more relevant (b) the best ML algorithms, those based on deep neural nets, are (literally) unpublishable - we cannot see how their decisions have actually been computed. Instead, to stabilize, the field will need to figure out how to base its inferences on some syntheses of actual empirical theories.

      We appreciate the reviewer’s insightful comments and concerns regarding the use of supervised machine learning in our study. We acknowledge the rapid advancements in the field of machine learning and its significant impact on various domains, including geometric morphometrics. Although we are aware of the ongoing integration of machine learning techniques in geometric morphometrics, our objective was to thoroughly investigate some of the conventional and more frequently used models for comparative analysis.

      Our intention was also to develop a Python module that enables users to easily apply these models to their landmark data. We recognise that most users typically apply machine learning methods to the principal component analysis (PCA) of their landmark data (2), unless PCA fails to explain enough variance (3), as we discussed in the context of Linear Discriminant Analysis (LDA). Our study demonstrates that these machine learning methods can be directly applied after generalised Procrustes analysis (GPA), without necessitating PCA as an intermediary step. This highlights another significant point of our research: the often automatic and potentially unnecessary use of PCA in geometric morphometrics.

      Furthermore, we acknowledge that the availability of more extensive data might have allowed us to explore more complex methods, such as neural networks. However, neural networks require a substantial amount of data due to their numerous learning parameters, which we did not possess in this study. It is also evident that not every algorithm is suitable for every situation. Our findings revealed that simpler models, such as the nearest neighbours classifier, which do not even have a training phase, performed exceptionally well. Additionally, the nearest neighbours classifier offers the desired transparency and interpretability, addressing the reviewer’s concern regarding the opacity of more complex models.

      We hope this clarifies our approach and objectives, and we sincerely thank the reviewer for their valuable feedback, which has helped us refine our study and its presentation.

      It's not that this reviewer is cynical, but it is fair to suggest a revision conveying a concern for the truly striking lack of organized skepticism in the literature that is being critiqued here. A revision along those lines would serve as a flagship example of exactly the deeper argument that reference (17) was trying to seed, that the applied literature obviously needs a hundred times more of. Such a review would do the most good if it appeared in one of the same journals - AJBA, Evolution, Journal of Human Evolution, Paleobiology - where the bulk of the most highly cited misuses of PCA themselves have appeared.

      First, we do not believe that this reviewer is cynical, and we hope they will not consider us cynical if we point out that the field has thus far largely ignored previous reports of PCA misuses published in those journals, like the excellent Bookstein 2019 (4) paper, so perhaps a different approach is needed with a different journal.

      Second, our MS is not a review. We agree with the reviewer that a review of PCA critical papers is of value. We changed the title of our study to make it easier to find, and we thank the reviewer for the comment. 

      Reviewer #3:

      Comment 1. Mohseni and Elhaik challenge the widespread use of PCA as an analytical and interpretive tool in the study of geometric morphometrics. The standard approach in geometric morphometrics analysis involves Generalised Procrustes Analysis (GPA) followed by Principal Component Analysis (PCA). Recent research challenges PCA outcomes' accuracy, robustness, and reproducibility in morphometrics analysis. In this paper, the authors demonstrate that PCA is unreliable for such studies. Additionally, they test and compare several Machine-Learning methods and present MORPHIX, a Python package of their making that incorporates the tools necessary to perform morphometrics analysis using ML methods.

      Mohseni and Elhaik conducted a set of thorough investigations to test PCA's accuracy, robustness, and reproducibility following renewed recent criticism and publications where this method was abused. Using a set of 2 and 3D morphometric benchmark data, the authors performed a traditional analysis using GPA and PCA, followed by a reanalysis of the data using alternative classifiers and rigorous testing of the different outcomes.

      In the current paper, the authors evaluated eight ML methods and compared their classification accuracy to traditional PCA. Additionally, common occurrences in the attempted morphological classification of specimens, such as non-representative partial sampling, missing specimens, and missing landmarks, were simulated, and the performance of PCA vs ML methods was evaluated.

      This is a correct description of our MS.

      The main problem with this manuscript is that it is three papers rolled into one, and the link doesn't work.

      We agree that the manuscript is comprehensive and can probably be broken down into more than one manuscript. However, we do not adhere to the philosophies of the least publishable unit (LPU), the smallest publishable unit (SPU), or the minimum publishable unit (MPU). Instead, we believe in producing high-quality and encompassing studies.

      We checked the link thoroughly and ensured it is functional, thank you for your comment.

      The title promises a new Python package, but the actual text of the manuscript spends relatively little time on the Python package itself and barely gives any information about the package and what it includes or its usefulness. It is definitely not the focus of the manuscript. The main thrust of the manuscript, which takes up most of the text, is the analysis of the papionin dataset, which shows very convincingly that PCA underperforms in virtually all conditions tested.

      We agree. We revised the title to reflect the main issue of the paper. Thank you for your comment.

      In addition, the manuscript includes a rather vicious attack against two specific cases of misuse of PCA in paleoanthropological studies, which does not connect with the rest of the manuscript at all.

      We consider these case studies of the use of PCA, which resonate with our ultimate goal. First, the previous reviewer suggested that we are beating a “dead horse.” We provide very recent and high-profile test cases to support our position that PCA is a popular and widely used method. Second, we wish to show how researchers use data alternations to cherry-pick results. Third, we focus on one of the use cases (the Homo NS) to demonstrate the poor scientific practices prevalent in this field, such as refusing to share data and breaking Science’s policies to protect this act.

      If the manuscript is a criticism of PCA techniques, this should be reflected in the title. If it is a report of a new Python package, it should focus on the package. Otherwise, there should be two separate manuscripts here.

      It is a criticism of PCA, and it is now reflected in the title; thank you again.

      The criticism of PCA is valid and important. However, pointing out that it is problematic in specific cases and is sometimes misused does not justify labeling tens of thousands of papers as questionable and does not justify vilifying an entire discipline. The authors do not make a convincing enough case that their criticism of the use of PCA in analyzing primate or hominin skulls is relevant to all its myriad uses in morphometrics. The criticism is largely based on statistical power, but it is framed as though it is a criticism of geometric morphometrics in general.

      We appreciate the opportunity to address the concerns raised regarding our critique of PCA. The reviewer argues that because we analyzed only primate skulls, we cannot extrapolate that PCA will be biased in analyzing other data (other taxa or other usages). Using the same logic, we can also argue that PCA cannot be used to study NEW taxa and certainly not to detect NOVEL taxa because it was never shown to apply to these taxa. We can further argue that PCA cannot be sued to study ANY taxa since it was never shown to yield correct results (PCA results are justified through circular reasoning and are adjusted when they do not show the desired results). However, that part of our answer is not a defense of our method but rather a further criticism of the field.

      To answer the question more directly, our criticism of PCA is rooted in empirical evidence and robust research, including studies by Elhaik (5) and others (6, 7), demonstrating that PCA lacks the power to produce accurate and reliable results. If the reviewer believes that using cats instead of primates will somehow boost the accuracy of PCA, they should, at the very least, explain what morphological properties of cats justify this presumption. Concerning the case of other usages, we clearly noted that “the scope of our study was limited to PCA usage in geometric morphology.”  The reviewer did not explain why our analysis is not “convincing enough,” so we cannot address it.

      As you know, this issue extends beyond the specific case study of primate or hominin skulls in our research. Despite its widespread use, PCA is heavily relied upon in the field, often without sufficient scrutiny of its limitations. Our intention is not to vilify an entire discipline but to highlight the pervasive and sometimes unquestioning reliance on PCA across many studies in geometric morphometrics. Calling to reevaluate studies based on problematic method is not a vilification, this is by definition science.

      While we understand the concern about the generalisability of our findings, our critique is based on the inherent limitations of PCA itself, not merely on statistical power. PCA lacks measurable power, a test of significance, and a null model. Its outcomes are highly sensitive to the input data, making them susceptible to manipulation and interpretation. Moreover, the ability to evaluate various dimensions allows for cherry-picking of results, where different outcomes can be equally acceptable, thus undermining the robustness of conclusions drawn from PCA.

      We invite the reviewer to examine the mathematical basis of PCA as demonstrated in Figure 1 of Elhaik (2022) (https://www.nature.com/articles/s41598-022-14395-4/figures/1). We ask the reviewer to explain what in this straightforward calculation—calculating the mean of the dimensions, subtracting the mean from the dimensions, calculating the covariance matrix, and identifying the eigenvalues—convinces them that PCA is suitable for predicting evolutionary relationships between samples. What evidence supports the notion that evolutionary relationships can be inferred by merely subtracting the mean of a matrix? There is none, just as there is no statistical power in this method. PCA does not know what the data mean. It can be applied equally to horse race data and a dataset that records how many times Home Simpsons says his catchphrases. PCA is not an evolutionary method; it’s just a linear transformation. If we ask anyone why they trust it, eventually, we will get the answer that with enough tweaking, PCA results produce what the scientist wants to show, and, most importantly, it will be mathematically accurate (and as mathematically accurate as the result of all possible tweaks). There is nothing specific to hominins about it. If your method produces conflicting results by tweaking the number of samples, species, or landmarks, as we showed, your method is worthless. This is what we demonstrated.

      We would also like to note that if we had easier access to more data, we would have extended our analysis further and shown that the bias exists in other species. As explained in our manuscript, we reached out to several scientists who refused to share their data so that we would not show biases in their studies. As this reviewer is undoubtedly aware of the practices in the field, this criticism is extremely unfair.

      Finally, arguing that our MS dismisses the entire field of geometric morphometrics is also unfair and provocative. We made no such claim. On the contrary, we offer an unbiased method to replace PCA and improve the accuracy of studies in this field.

      We hope this clarifies our position and reinforces the validity of our critique. Thank you for your valuable feedback and for allowing us to address these important points.

      Comment 2a. The article's tone is very argumentative and provocative, and non-necessary superlatives and modifiers are used ("...colourful scatterplots", lines 101, 155, 672). While this is an excellent paper and should be studied by morphometrics experts and probably anyone using PCA, the overall tone does nothing to help. It reads somewhat like a Facebook rant rather than a scientific paper (there is still, we hope, a difference between the two). Please tone it down.

      Again, we thank the reviewer for considering our work excellent. We regret that the reviewer believes that describing colorful (#101) scatterplots as such is a provocation. We do not feel the same way. “Subsumed” (#155) has been suggested to us by an anonymous reviewer. We changed it to “classified” to satisfy the reviewer (However, Schwartz et al. (2014) raised concerns about the phylogenetic inferences based on PCA results of the geometric morphometrics analysis, noting the failure of the method to capture visually obvious differences between the Dmanisi crania and specimens commonly classified under Homo erectus.).  We do not understand the problem with #672, but we revised it to read “However, a growing body of literature criticises the accuracy of various PCA applications, raising concerns about its use in geometric morphometrics.” We hope that this satisfies the reviewer. We made no special effort to be argumentative or provocative. There is no need for that; our results speak for themselves. We did, however, make an effort to communicate the gravity of our findings by citing K. Popper. We do not consider this a provocation.

      Comment 2b. The acronym ML is normally used to denote Maximum Likelihood in the context of phylogenetic studies. The authors use it to denote Machine Learning, which many readers may find confusing (this reviewer took a while to realize that it was not referring to Maximum Likelihood). Perhaps leave "machine learning" written in full.

      We understand that in some contexts, "ML" typically denotes Maximum Likelihood, which can indeed cause confusion. Unfortunately, “ML” is also a well-established acronym for machine learning, and since our paper doesn’t deal with Maximum Likelihood but rather machine learning, we have to choose the latter. Initially, we did spell out "Machine Learning" in full to avoid this confusion. However, upon review, we found that the manuscript's readability and flow were compromised, leading us to revert to the acronym.

      We appreciate your suggestion and understand the importance of clarity. To address this, we will ensure that the first mention of "ML" is accompanied by "Machine Learning" written in full (Line 244). This should help maintain both clarity and readability. Thank you for your valuable input.

      Comment 3. In lines 142, 157 Rohlf's should be Rohlf.

      (Lines 191, 205) We modified it accordingly and replaced "Rohlf's" with "Rohlf".

      Comment 4. The short paragraph in lines 165-167 feels out of place and does not connect to the paragraphs before and after it.

      (Lines 210-223) We modified the introduction and merged that paragraph with a relevant paragraph. The new paragraph reads:

      “PCA’s prominent role in morphometrics analyses and, more generally, physical anthropology is inconsistent with the recent criticisms, raising concerns regarding its validity and, consequently, the value of the results reported in the literature. To assess PCA’s accuracy, robustness, and reproducibility in geometric morphometric analysis, particularly its potential biases and inconsistencies in clustering with species taxonomy for phylogenetic reconstruction, we utilised a benchmark database containing landmarks from six known species within the Old World monkeys tribe Papionini. We altered this dataset to simulate typical characteristics of paleontological data. We found that PCA’s outcomes lack reliability, robustness, and reproducibility. We also evaluated the argument that a high explained variance could be counted as a measure of reliability (2) and found no association between high explained variance amounts and the subjectiveness of the results. If PCA of morphometric landmark data produces biased results, then landmark-based geometric morphometric studies employing PCA, conservatively estimated to range jfrom 18,400 to 35,200 (as of July 2024) (see Methods), should be reevaluated.”

      We thank the reviewer for the suggestion.

      References

      (1) Gilbert CC, Rossie JB. Congruence of molecules and morphology using a narrow allometric approach. Proceedings of the National Academy of Sciences. 2007;104(29):11910-11914.

      (2) Courtenay LA, Yravedra J, Huguet R, Aramendi J, Maté-González MÁ, González-Aguilera D, et al. Combining machine learning algorithms and geometric morphometrics: a study of carnivore tooth marks. Palaeogeography, Palaeoclimatology, Palaeoecology. 2019;522:28-39.

      (3) Bellin N, Calzolari M, Callegari E, Bonilauri P, Grisendi A, Dottori M, et al. Geometric morphometrics and machine learning as tools for the identification of sibling mosquito species of the Maculipennis complex (Anopheles). Infection, Genetics and Evolution. 2021;95:105034.

      (4) Bookstein FL. Pathologies of between-groups principal components analysis in geometric morphometrics. Evolutionary Biology. 2019;46(4):271-302.

      (5) Elhaik E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Scientific reports. 2022;12(1):1-35.

      (6) Cardini A, Polly PD. Cross-validated between group PCA scatterplots: a solution to spurious group separation? Evolutionary Biology. 2020;47(1):85-95.

      (7) Berner D. Size correction in biology: how reliable are approaches based on (common) principal component analysis? Oecologia. 2011;166(4):961-971.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Kainov et al investigated the prevalence of mutations in 3'UTR that affect gene expression in cancer to identify noncoding cancer drivers.

      The authors used data from normal controls (1000 genome data) and compared it to cancer data (PCAWG). They found that in cancer 3'UTR mutations had a stronger effect on cleavage than the normal population. These mutations are negatively selected in the normal population and positively selected in cancers. The authors used PCAWG data set to identify such mutations and found that the mutations that lead to a reduction of gene expression are enriched in tumor suppressor genes and those that are increased in gene expression are enriched for oncogenes. 3'UTR mutations that reduce gene expression or occur in TSGs cooccur with non-synonymous mutations. The authors then validate the effect of 3'UTR mutations experimentally using a luciferase reporter assay. These data identify a novel class of noncoding driver genes with mutations in 3'UTR that impact polyadenylation and thus gene expression.

      This is an elegant study with fundamental insight into identifying cancer driver genes. The conclusions of this paper are mostly well supported by data, but some aspects of data analysis need to be extended.

      We thank the reviewer for the positive assessment of our work and constructive comments.

      (1) It would be important for the authors to show if the findings of this study hold for metastatic cancers since most deaths occur due to metastasis and tumor heterogeneity changes when cancer progresses to metastasis. The authors should use the Hartwig data and show if metastatic cancers are enriched for 3'UTR mutations.

      This is a good suggestion, but we believe that the proposed analysis would have a significantly stronger impact in the context of a separate study focused specifically on longitudinal changes in the somatic mutation landscape as cancer progresses from primary tumours to metastases. Conducting such a study would require obtaining permissions to use relevant controlled datasets and, ideally, collaborating with oncologists to generate additional genome and transcriptome sequencing data. As such, this level of analysis would go beyond the current scope of our work.

      (2) Figure 2 should show the distribution of 3'UTR mutations by cancer type especially since authors go on to use colorectal cancer only for validations. It would be helpful to bring Figures S3A and S3C to this panel since these findings make the connections to cancer biology. Are any molecular functions enriched in addition to biological processes? Are kinases, phosphatases, etc more or less affected by 3'UTR mutations?

      As suggested, we have added a pie chart showing the distribution of 3’UTR mutations by cancer type (new Fig. 2E). Notably, nearly a half of the mutations in our dataset was of colorectal adenocarcinoma origin, justifying the focus on this type of cancer in our subsequent validation analyses. 

      To strengthen the connections to cancer biology, we moved Fig. S3A and S3C to the main text. It was more logical to integrate these panels into Fig. 3 rather than Fig. 2. We also analysed molecular function enrichment in Fig. 3E. Consistent with the biological process enrichment (now shown in Fig. 3D), this revealed an enrichment of proteins interacting with the ubiquitination pathway, including tumour suppressors SMAD2, APC and AXIN1.

      (3) Figure 3 looks at the co-occurrence of 3'UTR mutations with non-synonymous mutations but what about copy number change? You would expect the loss of the other allele to be enriched. Along the same line, are these data phased? Do you know that the nonsynonymous mutations are in the other allele or in the same allele that shows 3'UTR mutation?

      As suggested, we have analysed copy number variation data. As mentioned in the revised Results, this "showed that increased copy number was 4.1-times more common in the PCAWG data compared to allele loss. However, the incidence of copy number increase was substantially lower in the DOWN-paSNV group compared to the BG-paSNV control (Fig. S6). This points to a negative selection against duplications of genes affected by DOWNpaSNVs in cancer".

      Phasing somatic mutations in cancer samples is challenging due to high genetic heterogeneity of tumour cells. This situation will likely improve in the near future with the increased use of long-read sequencing. However, with currently available data, there is no straightforward method to determine whether mutations co-occur in the same cell. We have added a note on this in the Discussion section: "As long-read genomic sequencing data become increasingly available, it will be interesting to investigate whether these additional mutations occur in the same or in a different allele compared to the DOWN-paSNVs".

      Reviewer #2 (Public Review):

      Summary:

      To evaluate whether somatic mutations in cancer genomes are enriched with mutations in polyadenylation signal regions, the authors analyzed 1000 genomes data and PCAWG data as a control and experimental set, respectively. They observed increased enrichment of somatic mutations that may affect the function of polyA signals and confirmed that these mutations may influence the expression of the gene through a minigene expression experiment.

      Strengths:

      This study provides a systematic evaluation of polyA signal, which makes it valuable. Overall, the analytic approach and results are solid and supported by experimental validation.

      Thank you.

      Weaknesses:

      (1) This study uses APARENT2 as a tool to evaluate functional alteration in polyA signal sequences. Based on the original paper and the results shown in this paper, the algorithm appears to be of high quality. However, the whole study is dependent on the output of APARENT2. Therefore, it would be nice to

      (a) run and show a positive control run, which can show that the algorithm works well, and (b) describe the rationale for selecting this algorithm in the main text.

      As suggested, we have added control analyses to Fig. S1A-B, which show that APARENT2 performs well in our hands. We have described the rationale for using APARENT in the Results as follows: "For each paSNV, we calculated the change in cleavage/polyadenylation efficiency using the APARENT2 neural network model, which has been shown to infer this statistic more accurately than earlier approaches [Ref23]".

      (2) Are there recurrent somatic mutation calls (= exactly the same mutation across different tumor samples) in the poly(A) region of certain genes?

      We indeed see several cases where the same cleavage/polyadenylation signal is affected by the same or different DOWN mutations in different cancer samples. This finding is now summarized in the Results section and Table S1 as follows: "In several cases, including LRP1B and FOXO1, which are known to act as tumour suppressors in certain cancers, the same signal/polyadenyalation signal was disrupted by the same or different mutations in more than one sample (see columns Mut_Recurrence and Signal_Recurrence in Table S1)".

      (3) The authors nicely showed that the minigene with A>G mutation altered gene expression. Maybe one can reach a similar conclusion by analyzing a cancer dataset that has mutation and gene expression data? That is, genes with or without polyA mutations show different expression levels.

      The data presented in Fig. 5A-B show that DOWN-paSNV mutations have a negative effect on the expression of endogenous tumour suppressor genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figures should be numbered in order. For example, Figure S3C is referred to in the text before S3A-B, etc.

      We have proofread the text to fix this problem.

      Adding a supplementary file with lists of genes carrying 3'UTR mutations split by effect on gene expression and cancer type would be very useful for the community.

      We now show this in Table S1, with the caveat that we could not consistently investigate the effect of DOWN-paSNV on gene expression since the transcriptomics data are not available for all cancers.

      Spelling mistake in Figure 1A - genone should be genome.

      Fixed - thank you.

      Typo in Figure 1B x-axis label +50nt should be -50nt to the left of the dashed line.

      Fixed - thank you.

      All figures use E to denote x10 but it would make the figures more readable if authors used the standard notation (x10) for all numbers with exponents and base 10.

      Done.

    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      It is well known that autophagosomes/autolysosomes move along microtubules. However, because previous studies did not distinguish between autophagosomes and autolysosomes, it remains unknown whether autophagosomes begin to move after fusion with lysosomes or even before fusion. In this manuscript, the authors show, using fusion-deficient cells, that both pre-fusion autophagosomes and lysosomes can move along the MT toward the minus end. By screening motor proteins and Rabs, the authors found that autophagosomal traffic is primarily regulated by the dynein-dynactin system and can be counter-regulated by kinesins. They also show that Rab7-Epg5 and Rab39-ema interactions are important for autophagosome trafficking.

      Strengths:

      This study uses reliable Drosophila genetics and high-quality fluorescence microscopy. The data are properly quantified and statistically analyzed. It is a reasonable hypothesis that gathering pre-fusion autophagosomes and lysosomes in close proximity improves fusion efficiency.

      Thank you for your positive comments and for acknowledging the strengths of our work.

      Weaknesses:

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed.

      Thank you for this comment. We plan to perform immunohistochemistry experiment on Vps16A KD fat body cells for mCherry and Lamp1, as in case of other panels of Figure 8. We will also analyse the distribution of each.

      It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.

      While we cannot exclude the possibility that autophagosomes are transported indirectly by being tethered to lysosomes. However, we find this unlikely be the case as we believe in fat cells lysosomes and autophagosomes will rapidly fuse with each other if they get close enough.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RANi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.

      Thank you for your careful insights. The mCherry-Atg8a reporter we use is highly stable in autolysosomes due to the resilience of the mCherry fluorophore within these acidic, post-fusion structures, making it useful for labelling both autophagosomes and autolysosomes. Notably, the high intensity of mCherry-Atg8a within autolysosomes allows us to distinguish them from pre-fusion autophagosomes, which appear fainter and smaller, especially when accumulated in fusion-defective backgrounds (as shown in Figure 4). We therefore regard larger, brighter structures as autolysosomes.

      To improve clarity, we included additional markers—endogenous Lamp1 staining (Figure 8) and Lamp1-GFP (Figure S9)—to help differentiate between autophagic structures. Lamp1-negative, mCherry-Atg8a-positive vesicles indicate pre-fusion autophagosomes, while Lamp1/mCherry-Atg8a double-positive vesicles represent autolysosomes. Additionally, Lamp1-positive, mCherry-Atg8a-negative vesicles mark lysosomes of non-autophagic origin. We appreciate your suggestion

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).

      Thank you for this suggestion. We will generate additional Drosophila lines similar to those used in our current study, substituting Syntaxin17, SNAP29 or Vamp7 RNAi for Vps16A RNAi. We will test key phenotypic hits with these new backgrounds to confirm our findings.

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosome-lysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

      Thank you for this comment. While we agree that Rab7 and Epg5 are involved in autophagosome-lysosome tethering and subsequent fusion, we believe they also play an additional role in autophagosome movement. Our hypothesis stems from the observation that the phenotypes of vps16 RNAi and rab7 or epg5 RNAi are not identical. In contrast, RNAi targeting SNARE proteins involved exclusively in fusion (Syx17, SNAP29, and Vamp7) all result in a consistent phenotype: autophagosomes accumulate around the nucleus, closely resembling the phenotype observed with vps16 depletion. This suggests that these SNAREs are specifically involved in fusion. Since Rab7 and Epg5 depletion scatters autophagosomes throughout the cytosol rather than transporting them to the nucleus, we hypothesize that this is due to impaired movement of autophagosomes. This hypothesis is further supported by our co-IP data showing that Epg5 binds to dyneins.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Boda et al. describes the results of a targeted RNAi screen in the background of Vps16A-depleted Drosophila larval fat body cells. In this background, lysosomal fusion is inhibited, allowing the authors to analyze the motility and localization specifically of autophagosomes, prior to their fusion with lysosomes to become autolysosomes. In this Vps16A-deleted background, mCherry-Atg8a-labeled autophagosomes accumulate in the perinuclear area, through an unknown mechanism.

      The authors found that the depletion of multiple subunits of the dynein/dynactin complex caused an alternation of this mCherry-Atg8a localization, moving from the perinuclear region to the cell periphery. Interactions with kinesin overexpression suggest these motor proteins may compete for autophagosome binding and transport. The authors extended these findings by examining potential upstream regulators including Rab proteins and selected effectors, and they also examined effects on lysosomal movement and autolysosome size. Altogether, the results are consistent with a model in which specific Rab/effector complexes direct the movement of lysosomes and autophagosomes toward the MTOC, promoting their fusion and subsequent dispersal throughout the cell.

      Strengths:

      Although previous studies of the movement of autophagic vesicles have identified roles for microtubule-based transport, this study moves the field forward by distinguishing between effects on pre- and post-fusion autophagosomes, and by its characterization of the roles of specific Dynein, Dynactin, and Rab complexes in regulating movement of distinct vesicle types. Overall, the experiments are well-controlled, appropriately analyzed, and largely support the authors' conclusions.

      Thank you for your positive comments and for acknowledging the strengths of our work.

      Weaknesses:

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.

      Thank you for highlighting this limitation. We plan to conduct time-lapse imaging of live fat body tissues expressing 3xmCherry-Atg8a and GFP-Lamp1 to visualize the movement and fusion events of pre-fusion autophagosomes (3xmCherry-Atg8a positive and GFP-Lamp1 negative) and lysosomes (GFP-Lamp1 positive). We expect these vesicles to exhibit movement toward the ncMTOC, providing insight into their behaviour under more typical conditions.

      Specific comments

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusion-related genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect.

      Thank you for this suggestion. We will generate additional Drosophila lines similar to those used in our current study, substituting Syntaxin17, SNAP29, and Vamp7 RNAi for Vps16A RNAi. We will test key phenotypic hits with these new backgrounds to confirm our findings.

      Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.

      Thank you for this helpful suggestion. We plan to conduct time-lapse experiments under various conditions (e.g., non-starved and starved at different durations) to monitor the motility of newly formed autophagosomes (3xmCherry-Atg8a positive, Lamp1 negative), allowing us to analyze their positioning dynamics without interference from fusion defects.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by co-labeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.

      Thank you for this positive comment, to confirm the presence of ectopic MTOC foci in Shot KD cells, we plan to co-label with MTOC markers, including Khc-nod-LacZ, and additional reporters like Msps-mCherry, in both Vps16A-depleted and normal backgrounds.

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or co-depleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.

      Thank you for this detailed observation. Indeed, we frequently observe autophagosomes redistributing to contact-free peripheral regions upon dynein depletion, resulting in an asymmetric distribution. We believe this redistribution to be kinesin-dependent, as shown in Figure 3: kinesin overexpression scatters or shifts autophagosomes to the periphery, while kinesin/dynein double knockdown causes widespread autophagosome scattering. The simplest explanation is that, in dynein's absence, kinesins drive autophagosome movement.

      Additionally, while the radial organization of the microtubule (MT) network has been documented in two independent studies that we referenced, neither study showed MT plus-ends specifically, towards which kinesins transport. It is plausible that, while the MT network appears radial and symmetrical, subtle asymmetry might influence kinesin-dependent transport in fat cells. To explore this further, we will express MT plus-end markers, such as EB1-RFP and EB1-GFP, as well as kinesin reporters like unc-104-GFP or HA-tagged kinesins.

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.

      Thank you for the suggestion to further validate our reporter. We will knock down additional Atg genes, including Atg14, Atg1, Atg6, and Vps34, to confirm that the mCherry-Atg8a-positive structures in the Vps16A RNAi background are indeed of autophagic origin.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry- Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.

      Thank you for this constructive feedback. We agree and will adjust our conclusions based on the negative results in the revised manuscript to account for the potential variability in depletion efficiency.

      Reviewer #3 (Public review):

      Summary:

      In multicellular organisms, autophagosomes are formed throughout the cytosol, while late endosomes/lysosomes are relatively confined in the perinuclear region. It is known that autophagosomes gain access to the lysosome-enriched region by microtubule-based trafficking. The mechanism by which autophagosomes move along microtubules remains incompletely understood. In this manuscript, Péter Lőrincz and colleagues investigated the mechanism driving the movement of nascent autophagosomes along the microtubule towards the non-centrosomal microtubule organizing center (ncMTOC) using the fly fat body as a model system. The authors took an approach whereby they examined autophagosome positioning in cells where autophagosome-lysosome fusion was inhibited by knocking down the HOPS subunit Vps16A. Despite being generated at random positions in the cytosol, autophagosomes accumulate around the nucleus when Vps16A is depleted. They then performed an RNA interference screen to identify the factors involved in autophagosome positioning. They found that the dynein-dynactin complex is required for the trafficking of autophagosomes toward ncMTOC. Dynein loss leads to the peripheral relocation of autophagosomes. They further revealed that a pair of small GTPases and their effectors, Rab7-Epg5 and Rab39-ema, are required for bidirectional autophagosome transport. Knockdown of these factors in Vps16a RNAi cells causes the scattering of autophagosomes throughout the cytosol.

      Strengths:

      The data presented in this study help us to understand the mechanism underlying the trafficking and positioning of autophagosomes.

      Thank you for your positive comment and for acknowledging the strengths of our work.

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7-colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      Thank you for suggesting an improvement to our Epg5 localization data. We plan to perform triple-staining experiments with autophagy and lysosome markers, such as Atg8a and Lamp1, together with Epg5-9xHA to provide a clearer context for Epg5 localization.

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.

      Thank you for this valuable suggestion. We will create similar Drosophila lines as used in our study but will now employ Syntaxin17, SNAP29, or Vamp7 RNAi. We will cross our most significant hits with these new lines to confirm our findings.

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.

      Thank you for pointing this out, we will perform the suggested quantifications and statistics.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Thank you for this insight. We will discuss the potential cell-type specificity of our findings in the revised manuscript. Additionally, we plan to examine the distribution of the mCherry-Atg8a reporter in the vps16A RNAi background in other cell types, such as salivary gland cells, to broaden our analysis.

      Minor concerns:

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.

      Thank you for noting this. We will repeat the experiment in Figure 5C to obtain clearer images. The smeared Dhc64C input bands in Figure 5L are due to the large size of this protein, which affects its migration characteristics. We will address this in the revised manuscript.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.

      Thank you for this comment. Both reporters are well-established as autophagic markers and function similarly. However, to reduce confusion, we have used only one type per figure to ensure comparability of results.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.

      Thank you for your suggestion. We will repeat these experiments and provide higher-quality, enlarged images for clarity.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?

      Thank you for this question. We will investigate this by co-transfecting the cells with WT and GTP- or GDP-locked Rab7 mutants (which mimic constitutively active and dominant-negative forms, respectively) with Epg5-9xHA. This will allow us to assess whether Rab7 modulates the Epg5-Dhc interaction.

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

      Thank you for this comment. We will moderate our statement regarding Epg5's role as an autophagosome-specific adaptor in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Bennion and colleagues present a careful examination of how an earlier set of memories can either interfere with or facilitate memories formed later. This impressive work is a companion piece to an earlier paper by Antony and colleagues (2022) in which a similar experimental design was used to examine how a later set of memories can either interfere with or facilitate memories formed earlier. This study makes contact with an experimental literature spanning 100 years, which is concerned with the nature of forgetting, and the ways in which memories for particular experiences can interact with other memories. These ideas are fundamental to modern theories of human memory, for example, paired-associate studies like this one are central to the theoretical idea that interference between memories is a much bigger contributor to forgetting than any sort of passive decay. 

      Strengths: 

      At the heart of the current investigation is a proposal made by Osgood in the 1940s regarding how paired associates are learned and remembered. In these experiments, one learns a pair of items, A-B (cue-target), and then later learns another pair that is related in some way, either A'-B (changing the cue, delta-cue), or A-B' (changing the target, delta-target), or A'-B' (changing both, delta-both), where the prime indicates that item has been modified, and may be semantically related to the original item. The authors refer to the critical to-be-remembered pairs as base pairs. Osgood proposed that when the changed item is very different from the original item there will be interference, and when the changed item is similar to the original item there will be facilitation. Osgood proposed a graphical depiction of his theory in which performance was summarized as a surface, with one axis indicating changes to the cue item of a pair and the other indicating changes to the target item, and the surface itself necessary to visualize the consequences of changing both. 

      In the decades since Osgood's proposal, there have been many studies examining slivers of the proposal, e.g., just changing targets in one experiment, just changing cues in another experiment. Because any pair of experiments uses different methods, this has made it difficult to draw clear conclusions about the effects of particular manipulations. 

      The current paper is a potential landmark, in that the authors manipulate multiple fundamental experimental characteristics using the same general experimental design. Importantly, they manipulate the semantic relatedness of the changed item to the original item, the delay between the study experience and the test, and which aspect of the pair is changed. Furthermore, they include both a positive control condition (where the exact same pair is studied twice), and a negative control condition (where a pair is only studied once, in the same phase as the critical base pairs). This allows them to determine when the prior learning exhibits an interfering effect relative to the negative control condition and also allows them to determine how close any facilitative effects come to matching the positive control. 

      The results are interpreted in terms of a set of existing theories, most prominently the memory-for-change framework, which proposes a mechanism (recursive reminding) potentially responsible for the facilitative effects examined here. One of the central results is the finding that a stronger semantic relationship between a base pair and an earlier pair has a facilitative effect on both the rate of learning of the base pair and the durability of the memory for the base pair. This is consistent with the memory-for-change framework, which proposes that this semantic relationship prompts retrieval of the earlier pair, and the two pairs are integrated into a common memory structure that contains information about which pair was studied in which phase of the experiment. When semantic relatedness is lower, they more often show interference effects, with the idea being that competition between the stored memories makes it more difficult to remember the base pair. 

      This work represents a major methodological and empirical advance for our understanding of paired-associates learning, and it sets a laudably high bar for future work seeking to extend this knowledge further. By manipulating so many factors within one set of experiments, it fills a gap in the prior literature regarding the cognitive validity of an 80-year-old proposal by Osgood. The reader can see where the observed results match Osgood's theory and where they are inconclusive. This gives us insight, for example, into the necessity of including a long delay in one's experiment, to observe potential facilitative effects. This point is theoretically interesting, but it is also a boon for future methodological development, in that it establishes the experimental conditions necessary for examining one or another of these facilitation or interference effects more closely. 

      We thank the reviewer for their thorough and positive comments -- thank you so much!

      Weaknesses: 

      One minor weakness of the work is that the overarching theoretical framing does not necessarily specify the expected result for each and every one of the many effects examined. For example, with a narrower set of semantic associations being considered (all of which are relatively high associations) and a long delay, varying the semantic relatedness of the target item did not reliably affect the memorability of that pair. However, the same analysis showed a significant effect when the wider set of semantic associations was used. The positive result is consistent with the memory-for-change framework, but the null result isn't clearly informative to the theory. I call this a minor weakness because I think the value of this work will grow with time, as memory researchers and theorists use it as a benchmark for new theory development. For example, the data from these experiments will undoubtedly be used to develop and constrain a new generation of computational models of paired-associates learning. 

      We thank the reviewer for this constructive critique. We agree that the experiments with a narrower set of semantic associations are less informative; in fact, we thought about removing these experiments from the current study, but given that we found results in the ΔBoth condition in Antony et al. (2022) using these stimuli that we did NOT find in the wider set, we thought it was worth including for a thorough comparison. We hope that the analyses combining the two experiment sets (Fig 6-Supp 1) are informative for contextualizing the results in the ‘narrower’ experiments and, as the reviewer notes, for informing future researchers.

      Reviewer #2 (Public Review): 

      Summary: 

      The study focuses on how relatedness with existing memories affects the formation and retention of new memories. Of core interest were the conditions that determine when prior memories facilitate new learning or interfere with it. Across a set of experiments that varied the degree of relatedness across memories as well as retention interval, the study compellingly shows that relatedness typically leads to proactive facilitation of new learning, with interference only observed under specific conditions and immediate test and being thus an exception rather than a rule. 

      Strengths: 

      The study uses a well-established word-pair learning paradigm to study interference and facilitation of overlapping memories. However it goes more in-depth than a typical interference study in the systematic variation of several factors: (1) which elements of an association are overlapping and which are altered (change target, change cue, change both, change neither); (2) how much the changed element differs from the original (word relatedness, with two ranges of relatedness considered); (3) retention period (immediate test, 2-day delay). Furthermore, each experiment has a large N sample size, so both significant effects as well as null effects are robust and informative. 

      The results show the benefits of relatedness, but also replicate interference effects in the "change target" condition when the new target is not related to the old target and when the test is immediate. This provides a reconciliation of some existing seemingly contradictory results on the effect of overlap on memory. Here, the whole range of conditions is mapped to convincingly show how the direction of the effect can flip across the surface of relatedness values. 

      Additional strength comes from supporting analyses, such as analyses of learning data, demonstrating that relatedness leads to both better final memory and also faster initial learning. 

      More broadly, the study informs our understanding of memory integration, demonstrating how the interdependence of memory for related information increases with relatedness. Together with a prior study or retroactive interference and facilitation, the results provide new insights into the role of reminding in memory formation. 

      In summary, this is a highly rigorous body of work that sets a great model for future studies and improves our understanding of memory organization. 

      We thank their reviewer for their thorough summary and very supportive words!

      Weaknesses: 

      The evidence for the proactive facilitation driven by relatedness is very convincing. However, in the finer scale results, the continuous relationship between the degree of relatedness and the degree of proactive facilitation/interference is less clear. This could be improved with some additional analyses and/or context and discussion. In the narrower range, the measure used was AS, with values ranging from 0.03-0.98, where even 0.03 still denotes clearly related words (pious - holy). Within this range from "related" to "related a lot", no relationship to the degree of facilitation was found. The wider range results are reported using a different scale, GloVe, with values from -0.14 to 0.95, where the lower end includes unrelated words (sap - laugh). It is possible that any results of facilitation/interference observed in the wider range may be better understood as a somewhat binary effect of relatedness (yes or no) rather than the degree of relatedness, given the results from the narrower condition. These two options could be more explicitly discussed. The report would benefit from providing clearer information about these measures and their range and how they relate to each other (e.g., not a linear transformation). It would be also helpful to know how the values reported on the AS scale would end up if expressed in the GloVe scale (and potentially vice-versa) and how that affects the results. Currently, it is difficult to assess whether the relationship between relatedness and memory is qualitative or quantitative. This is less of a problem with interdependence analyses where the results converge across a narrow and wider range. 

      We thank the reviewer for this point. While other analyses do show differences across the range of AS values we used, we agree in the case of the memorability analysis in the narrower stimulus set, 48-hr experiment (or combining across the narrower and wider stimulus sets), there could be a stronger influence of binary (yes/no) relatedness. We have now made this point explicitly (p. 26):

      “Altogether, these results show that PI can still occur with low relatedness, like in other studies finding PI in ΔTarget (A-B, A-D) paradigms (for a review, see Anderson & Neely, 1996), but PF occurs with higher relatedness. In fact, the absence of low relatedness pairs in the narrower stimulus set likely led to the strong overall PF in this condition across all pairs (positive y-intercept in the upper right of Fig 3A). In this particular instance, there may have been a stronger influence of a binary factor (whether they are related or not), though this remains speculative and is not the case for other analyses in our paper.”

      Additionally, we have also emphasized that the two relatedness metrics are not linear transforms of each other. Finally, as in addressing both your and reviewer #3’s comment below, we now graph relatedness values under a common GloVe metric in Fig 1-Supp 1C (p. 9):

      “Please note that GloVe is an entirely different relatedness metric and is not a linear transformation of AS (see Fig 1-Supp 1C for how the two stimulus sets compare using the common GloVe metric).”

      A smaller weakness is generalizability beyond the word set used here. Using a carefully crafted stimulus set and repeating the same word pairings across participants and conditions was important for memorability calculations and some of the other analyses. However, highlighting the inherently noisy item-by-item results, especially in the Osgood-style surface figures, makes it challenging to imagine how the results would generalize to new stimuli, even within the same relatedness ranges as the current stimulus sets. 

      We thank the reviewer for this critique. We have added this caveat in the limitations to suggest that future studies should replicate these general findings with different stimulus sets (p. 28):

      “Finally, future studies could ensure these effects are not limited to these stimuli and generalize to other word stimuli in addition to testing other domains (Baek & Papaj, 2024; Holding, 1976).”

      Reviewer #3 (Public Review): 

      Summary: 

      Bennion et al. investigate how semantic relatedness proactively benefits the learning of new word pairs. The authors draw predictions from Osgood (1949), which posits that the degree of proactive interference (PI) and proactive facilitation (PF) of previously learned items on to-be-learned items depends on the semantic relationships between the old and new information. In the current study, participants learn a set of word pairs ("supplemental pairs"), followed by a second set of pairs ("base pairs"), in which the cue, target, or both words are changed, or the pair is identical. Pairs were drawn from either a narrower or wider stimulus set and were tested after either a 5-minute or 48-hour delay. The results show that semantic relatedness overwhelmingly produces PF and greater memory interdependence between base and supplemental pairs, except in the case of unrelated pairs in a wider stimulus set after a short delay, which produced PI. In their final analyses, the authors compare their current results to previous work from their group studying the analogous retroactive effects of semantic relatedness on memory. These comparisons show generally similar, if slightly weaker, patterns of results. The authors interpret their results in the framework of recursive reminders (Hintzman, 2011), which posits that the semantic relationships between new and old word pairs promote reminders of the old information during the learning of the new to-be-learned information. These reminders help to integrate the old and new information and result in additional retrieval practice opportunities that in turn improve later recall. 

      Strengths: 

      Overall, I thought that the analyses were thorough and well-thought-out and the results were incredibly well-situated in the literature. In particular, I found that the large sample size, inclusion of a wide range of semantic relatedness across the two stimulus sets, variable delays, and the ability to directly compare the current results to their prior results on the retroactive effects of semantic relatedness were particular strengths of the authors' approach and make this an impressive contribution to the existing literature. I thought that their interpretations and conclusions were mostly reasonable and included appropriate caveats (where applicable). 

      We thank the reviewer for this kind, effective summary and highlight of the paper’s strengths!

      Weaknesses: 

      Although I found that the paper was very strong overall, I have three main questions and concerns about the analyses. 

      My first concern lies in the use of the narrow versus wider stimulus sets. I understand why the initial narrow stimulus set was defined using associative similarity (especially in the context of their previous paper on the retroactive effects of semantic similarity), and I also understand their rationale for including an additional wider stimulus set. What I am less clear on, however, is the theoretical justification for separating the datasets. The authors include a section combining them and show in a control analysis that there were no directional effects in the narrow stimulus set. The authors seem to imply in the Discussion that they believe there are global effects of the lower average relatedness on differing patterns of PI vs PF across stimulus sets (lines 549-553), but I wonder if an alternative explanation for some of their conflicting results could be that PI only occurs with pairs of low semantic relatedness between the supplemental and base pair and that because the narrower stimulus set does not include the truly semantically unrelated pairs, there was no evidence of PI. 

      We agree with the reviewer’s interpretation here, and we have now directly stated this in the discussion section (p. 26):

      “Altogether, these results show that PI can still occur with low relatedness, like in other studies finding PI in ΔTarget (A-B, A-D) paradigms (for a review see, Anderson & Neely, 1996), but PF occurs with higher relatedness. In fact, the absence of low relatedness pairs in the narrower stimulus set likely led to the strong overall PF in this condition across all pairs (positive y-intercept in the upper right of Fig 3A).”

      As for the remainder of this concern, please see our response to your elaboration on the critique below.

      My next concern comes from the additive change in both measures (change in Cue + change in Target). This measure is simply a measure of overall change, in which a pair where the cue changes a great deal but the target doesn't change is treated equivalently to a pair where the target changes a lot, but the cue does not change at all, which in turn are treated equivalently to a pair where the cue and target both change moderate amounts. Given that the authors speculate that there are different processes occurring with the changes in cue and target and the lack of relationship between cue+target relatedness and memorability, it might be important to tease apart the relative impact of the changes to the different aspects of the pair. 

      We thank the reviewer for this great point. First, we should clarify that we only added cue and target similarity values in the ΔBoth condition, which means that all instances of equivalence relate to non-zero values for both cue and target similarity. However, it is certainly possible cue and target similarity separately influence memorability or interdependence. We have now run this analysis separately for cue and target similarity (but within the ΔBoth condition). For memorability, neither cue nor target similarity independently predicted memorability within the ΔBoth condition in any of the four main experiments (all p > 0.23). Conversely, there were some relationships with interdependence. In the narrower stimulus set, 48-hr delay experiment, both cue and target similarity significantly or marginally predicted base-secondary pair interdependence (Cue: r = 0.30, p = 0.04; Target: r = 0.29, p = 0.054). Notably, both survived partial correlation analyses partialing out the other factor (Cue: r = 0.33, p = 0.03; Target: r = 0.32, p = 0.04). In the wider stimulus set, 48-hr delay experiment, only target similarity predicted interdependence (Cue: r = 0.09, p = 0.55; Target: r = 0.34, p = 0.02), and target similarity also predicted interdependence after partialing out cue similarity (r = 0.34, p = 0.02). Similarly, in the narrower stimulus set, 5-min delay experiment, only target similarity predicted interdependence (Cue: r = 0.01, p = 0.93; Target: r = 0.41, p = 0.005), and target similarity also predicted interdependence after partialing out cue similarity (r = 0.42, p = 0.005). Neither predicted interdependence in the wider stimulus set, 5-min delay experiment (Cue: r = -0.14, p = 0.36; Target: r = 0.09, p = 0.54). We have opted to leave this out of the paper for now, but we could include it if the reviewer believes it is worthwhile.

      Note that we address the multiple regression point raised by the reviewer in the critique below.

      Finally, it is unclear to me whether there was any online spell-checking that occurred during the free recall in the learning phase. If there wasn't, I could imagine a case where words might have accidentally received additional retrieval opportunities during learning - take for example, a case where a participant misspelled "razor" as "razer." In this example, they likely still successfully learned the word pair but if there was no spell-checking that occurred during the learning phase, this would not be considered correct, and the participant would have had an additional learning opportunity for that pair. 

      We did not use online spell checking. We agree that misspellings would be considered successful instances of learning (meaning that for those words, they would essentially have successful retrieval more than once). However, we do not have a reason to think that this would meaningfully differ across conditions, so the main learning results would still hold. We have included this in the Methods (p. 29-30):

      “We did not use spell checking during learning, meaning that in some cases pairs could have been essentially retrieved more than once. However, we do not believe this would differ across conditions to affect learning results.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      In terms of the framing of the paper, I think the paper would benefit from a clearer explication of the different theories at play in the introductory section. There are a few theories being examined. Memory-for-change is described in most detail in the discussion, it would help to describe it more deliberately in the intro. The authors refer to a PI account, and this is contrasted with the memory-for-change account, but it seems to me that these theories are not mutually exclusive. In the discussion, several theories are mentioned in passing without being named, e.g., I believe the authors are referring to the fan effect when they mention the difference between delta-cue and delta-target conditions. Perhaps this could be addressed with a more detailed account of the theory underlying Osgood's predictions, which I believe arise from an associative account of paired-associates memory. Osgood's work took place when there was a big debate between unlearning and interference. The current work isn't designed to speak directly to that old debate. But it may be possible to develop the theory a bit more in the intro, which would go a long way towards scaffolding the many results for the reader, by giving them a better sense up front of the theoretical implications. 

      We thank the reviewer for this comment and the nudge to clarify these points. First, we have now made the memory-for-change and remindings accounts more explicit in the introduction, as well as the fact that we are combining the two in forming predictions for the current study (p. 3):

      “Conversely, in favor of the PF account, we consider two main, related theories. The first is the importance of “remindings” in memory, which involve reinstating representations from an earlier study phase during later learning (Hintzman, 2011). This idea centers study-phase retrieval, which involves being able to mentally recall prior information and is usually applied to exact repetitions of the same material (Benjamin & Tullis, 2010; Hintzman et al., 1975; Siegel & Kahana, 2014; Thios & D’Agostino, 1976; Zou et al., 2023). However, remindings can occur upon the presentation of related (but not identical) material and can result in better memory for both prior and new information when memory for the linked events becomes more interdependent (Hintzman, 2011; Hintzman et al., 1975; McKinley et al., 2019; McKinley & Benjamin, 2020; Schlichting & Preston, 2017; Tullis et al., 2014; Wahlheim & Zacks, 2019). The second is the memory-for-change framework, which builds upon these ideas and argues that humans often retrieve prior experiences during new learning, either spontaneously by noticing changes from what was learned previously or by instruction (Jacoby et al., 2015; Jacoby & Wahlheim, 2013). The key advance of this framework is that recollecting changes is necessary for PF, whereas PI occurs without recollection. This framework has been applied to paradigms including stimulus changes, including common paired associate paradigms (e.g., A-B, A-D) that we cover extensively later. Because humans may be more likely to notice and recall prior information when it is more related to new information, these two accounts would predict that semantic relatedness instead promotes successful remindings, which would create PF and interdependence among the traces.”

      Second, as the reviewer suggests, we were referring to the fan effect in the discussion, and we have now made that more explicit (p. 26):

      “We believe these effects arise from the competing processes of impairments between competing responses at retrieval that have not been integrated versus retrieval benefits when that integration has occurred (which occurs especially often with high target relatedness). These types of competing processes appear operative in various associative learning paradigms such as retrieval-induced forgetting (Anderson & McCulloch, 1999; Carroll et al., 2007), and the fan effect (Moeser, 1979; Reder & Anderson, 1980).”

      Finally, our reading of Osgood’s proposal is as an attempt to summarize the qualitative effects of the scattered literature (as of 1949) and did not discuss many theories. For this reason, we generally focus on the directional predictions relating to Osgood’s surface, but we couch it in theories proposed since then.

      It strikes me that the advantage seen for items in the retroactive study compared to the proactive study is consistent with classic findings examining spontaneous recovery. These classic studies found that first-learned materials tended to recover to a level above second-learned materials as time passed. This could be consistent with the memory-for-change proposal presented in the text. The memory-for-change proposal provides a potential cognitive mechanism for the effect, here I'm just suggesting a connection that could be made with the spontaneous recovery literature. 

      We thank the reviewer for this suggestion. Indeed, we agree there is a meaningful point of connection here. We have added the following to the Discussion (p. 27):

      “Additionally, these effects partially resemble those on spontaneous recovery, whereby original associations tend to face interference after new, conflicting learning, but slowly recover over time (either absolutely or relative to the new learning) and often eventually eclipse memory for the new information (Barnes & Underwood, 1959; Postman et al., 1969; Wheeler, 1995). In both cases, original associations appear more robust to change over time, though it is unclear whether these similar outcomes stem from similar mechanisms.”

      Minor recommendations 

      Line 89: relative existing -> relative to existing. 

      Line 132: "line from an unrelated and identical target" -> from an unrelated to identical target (take a look, just needs rephrasing). 

      Line 340: (e.g. peace-shaverazor) I wasn't clear whether this was a typographical error, or whether the intent was to typographically indicate a unified representation. <br /> Line 383: effects on relatedness -> effects of relatedness. 

      We think the reviewer for catching these errors. We have fixed them, and for the third comment, we have clarified that we indeed meant to indicate a unified representation (p. 12):

      “[e.g., peace-shaverazor (written jointly to emphasize the unification)]”

      Page 24: Figure 8. I think the statistical tests in this figure are just being done between the pairs of the same color? Like in the top left panel, delta-cue pro and delta-target retro are adjacent and look equivalent, but there is no n.s. marking for this pair. Could consider keeping the connecting line between the linked conditions and removing the connecting lines that span different conditions. 

      Indeed, we were only comparing conditions with the same color. We have changed the connecting lines to reflect this.

      Page 26 line 612: I think this is the first mention that the remindings account is referred to as the memory-for-change framework, consider mentioning this in the introduction. 

      Thank you – we have now mentioned this in the introduction.

      Lines 627-630. Is this sentence referring to the fan effect? If so it could help the reader to name it explicitly. 

      We have now named this explicitly.

      Reviewer #2 (Recommendations For The Authors): 

      This is a matter of personal preference, but I would prefer PI and PF spelled out instead of the abbreviations. This was also true for RI and RF which are defined early but then not used for 20 pages before being re-used again. In contrast, the naming of the within-subject conditions was very intuitive. 

      We appreciate this perspective. However, we prefer to keep the terms PI and PF for the sake of brevity. We now re-introduce terms that do not return until later in the manuscript.

      Osgood surface in Figure 1A could be easier to read if slightly reformatted. For example, target and cue relatedness sides are very disproportional and I kept wondering if that was intentional. The z-axis could be slightly more exaggerated so it's easier to see the critical messages in that figure (e.g., flip from + to - effect along the one dimension). The example word pairs were extremely helpful. 

      Figures 1C and 1D were also very helpful. It would be great if they could be a little bigger as the current version is hard to read. 

      Figure 1B took a while to decipher and could use a little more anticipation in the body of the text. Any reason to plot the x-axis from high to low on this figure? It is confusing (and not done in the actual results figures). I believe the supplemental GloVe equivalent in the supplement also has a confusing x-axis. 

      Thank the reviewer for this feedback. We have modified Figure 1A to reduce the disproportionality and accentuate the z-axis changes. We have also made the text in C and D larger. Finally, we have flipped around the x-axis in B and in the supplement.

      The description of relatedness values was rather confusing. It is not intuitive to accept that AS values from 0.03-0.96 are "narrow", as that seems to cover almost the whole theoretical range. I do understand that 0.03 is still a value showing relatedness, but more explanation would be helpful. It is also not clear how the GloVe values compare to the AS values. If I am understanding the measures and ranges correctly, the "narrow" condition could also be called "related only" while the "wide" condition could be called "related and unrelated". This is somewhat verbalized but could be clearer. In general, please provide a straightforward way for a reader to explicitly or implicitly compare those conditions, or even plot the "narrow" condition using both AS values and GloVe values so one can really compare narrow and wider conditions comparing apples with apples. 

      We thank the reviewer for this critique. First, we have now sought to clarify this in the Introduction (p. 11-12):

      “Across the first four experiments, we manipulated two factors: range of relatedness among the pairs and retention interval before the final test. The narrower range of relatedness used direct AS between pairs using free association norms, such that all pairs had between 0.03-0.96 association strength. Though this encompasses what appears to be a full range of relatedness values, pairs with even low AS are still related in the context of all possible associations (e.g., pious-holy has AS = 0.03 but would generally be considered related) (Fig 1B). The stimuli using a wider range of relatedness spanned the full range of global vector similarity (Pennington et al., 2014) that included many associations that would truly be considered unrelated (Fig 1-Supp 1A). One can see the range of the wider relatedness values in Fig 1-Supp 1B and comparisons between narrower and wider relatedness values in Fig 1-Supp 1C.”

      Additionally, as noted in the text above, we have added a new subfigure to Fig 1-Supp 1 that compares the relatedness values in the narrower and wider stimulus sets using the common GloVe metric.

      Considering a relationship other than linear may also be beneficial (e.g., the difference between AS of 0.03 and 0.13 may not be equal to AS of .83 and .93; same with GloVe). I am assuming that AS and GloVe are not linear transforms of each other. Thus, it is not clear whether one should expect a linear (rather than curvilinear or another monotonic) relationship with both of them. It could be as simple as considering rank-order correlation rather than linear correlation, but just wanted to put this out for consideration. The linear approach is still clearly fruitful (e.g., interdependence), but limits further the utility of having both narrow and wide conditions without a straightforward way to compare them. 

      We thank the reviewer for this point. Indeed, AS and GloVe are not linear transforms of each other, but metrics derived from different sources (AS comes from human free associations; GloVe comes from a learned vector space language model). (We noted this in the text and in our response to your above comment.) However, we do have the ability to put all the word pairs into the GloVe metric, which we do in the Results section, “Re-assessing proactive memory and interdependence effects using a common metric”. In this analysis, we used a linear correlation that combined data sets with a similar retention interval and replicated our main findings earlier in the paper (p. 5):

      “In the 48-hr delay experiment, correlations between memorability and cue relatedness in the ΔCue condition [r2(44) > 0.29, p < 0.001] and target relatedness in the ΔTarget condition [r2(44) = 0.2, p < 0.001] were significant, whereas cue+target relatedness in the ΔBoth condition was not [r2(44) = 0.01, p = 0.58]. In all three conditions, interdependence increased with relatedness [all r2(44) > 0.16, p < 0.001].”

      Following the reviewer suggestion to test things out using rank order, we also re-created the combined analysis using rank order based on GloVe values rather than the raw GloVe values. The ranks now span 1-90 (because there were 45 pairs in each of the narrower and wider stimulus sets). All results qualitatively held.

      Author response image 1.

      Rank order results.

      Author response image 2.

      And the raw results in Fig 6-Supp 1 (as a reference).

      Reviewer #3 (Recommendations For The Authors):

      In regards to my first concern, the authors could potentially test whether the stimulus sets are different by specifically looking at pairs from the wider stimulus set that overlap with the range of relatedness from the narrow set and see if they replicate the results from the narrow stimulus set. If the results do not differ, the authors could simplify their results section by collapsing across stimulus sets (as they did in the analyses presented in Figure 6 - Supplementary Figure 1). If the authors opt to keep the stimulus sets separate, it would be helpful to include a version of Figure 1b/Figure 1 - Supplementary Figure 1 where the coverage of the two stimulus sets are plotted on the same figure using GloVe similarity so it is easier to interpret the results. 

      We have conducted this analysis in two ways, though we note that we will eventually settle upon keeping the stimulus sets separate. First, we examined memorability between the data sets by removing one pair at a time from the wider stimulus set until there was no significant difference (p > 0.05). We did this at the long delay because that was more informative for most of our analyses. Even after reducing the wider stimulus set, the narrow stimulus set still had significantly or marginally higher memorability in all three conditions (p < 0.001 for ΔCue; p < 0.001 for ΔTarget; p = 0.08 for ΔBoth. We reasoned that this was likely because the AS values still differed (all, p < 0.001), which would present a clear way for participants to associate words that may not be as strongly similar in vector space (perhaps due to polysemy for individual words). When we ran the analysis a different way that equated AS, we no longer found significant memorability differences (p \= 0.13 for ΔCue; p = 0.50 for ΔTarget; p = 0.18 for ΔBoth). However, equating the two data sets in this analysis required us to drop so many pairs to equate the wider stimulus data set (because only a few only had a direct AS connection; there were 3, 5, and 1 pairs kept in the ΔCue, ΔTarget, and ΔBoth conditions) that we would prefer not to report this result.

      Additionally, we now plot the two stimulus sets on the same plot (Reviewer 2 also suggested this).

      In regards to my second concern, one potential way the authors could disambiguate the effects of change in cue vs change in target might be to run a multiple linear regression with change in Cue, change in Target, and the change in Cue*change in Target interaction (potentially with random effects of subject identity and word pair identity to combine experiments and control for pair memorability/counterbalancing), which has the additional bonus of potentially allowing the authors to include all word pairs in a single model and better describe the Osgood-style spaces in Figure 6.

      This is a very interesting idea. We set this analysis up as the reviewer suggested, using fixed effects for ΔCue, ΔTarget, and ΔCue*ΔTarget, and random effects for subject and word ID. Because we had a binary outcome variable, we used mixed effects logistic regression. For a given pair, if it had the same cue or target, the corresponding change column received a 0, and if it had a different cue or target, it received a graded value (1 - GloVe value between the new and old cue or target). For this analysis, because we designed this analysis to indicate a treatment away from a repeat (as in the No Δ condition, which had no change for either cues and targets), we omitted control items. For items in the ΔBoth condition, we initially used positive values in both the Cue and Target columns too, with the multiplied ΔCue*ΔTarget value in its own column. We focused these analyses on the 48-hr delay experiments. In both experiments, running it this way resulted in highly significant negative effects of ΔCue and ΔTarget (both p < 0.001), but positive effects of ΔCue*ΔTarget (p < 0.001), presumably because after accounting for the negative independent predictions of both ΔCue and ΔTarget, ΔCue*ΔTarget values actually were better than expected.

      We thought that those results were a little strange given that generally there did not appear to be interactions with ΔCue*ΔTarget values, and the positive result was simply due to the other predictors in the model. To show that this is the case, we changed the predictors so that items in the ΔBoth condition had 0 in ΔCue and ΔTarget columns alongside their ΔCue*ΔTarget value. In this case, all three factors negatively predicted memory (all p < 0.001).

      We don't necessarily see this second approach as better, partly because it seems clear to us that any direction you go from identity is just hurting memory, and we felt the need to drop the control condition. We next flipped around the analysis to more closely resemble how we ran the other analyses, using similarity instead of distance. Here, identity along any dimension indicated a 1, a change in any part of the pair involved using that pair’s GloVe value (rather than the 1 – the GloVe value from above), and the control condition simply had zeros in all the columns. In this case, if we code the cue and target similarity values as themselves in the ΔBoth condition, in both 48-hr experiments, cue and target similarity significantly positively predicted memory (narrower set: cue similarity had p = 0.006, target similarity had p < 0.001; wider set: both p < 0.001) and the interaction term negatively predicted memory (p < 0.001 in both). If we code cue and target similarity values as 0s in the ΔBoth condition, all three factors tend to be positive (narrower, Cue: p = 0.11, Target and Interaction: p < 0.001; wider, Cue and Target p < 0.001; Interaction: p = 0.07).

      Ultimately, we would prefer to leave this out of the manuscript in the interest of simplicity and because we largely find that these analyses support our prior conclusions. However, we could include them if the reviewer prefers.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment:

      In this useful study, the authors analyze droplet size distributions of multiple protein condensates and their fit to a scaling ansatz, highlighting that they exhibit features of first- and second-order phase transitions. The experimental evidence is still incomplete as the measurements were apparently done only at one time point, neglecting the possibility that droplet size distribution can evolve with time. The text would benefit from a connection to and contextualization with the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community and that emphasizes "liquid-gas" criticality. 

      We have now carried out new experiments at multiple time points to establish that the droplet size distributions are stationary below the critical concentration. We have also addressed the comments made by the reviewers about the nature of the phase transition.

      Our analysis does not depend on a specific hypothesis on the nature of the phase transition, whether it be percolation or a gas-liquid critical transition. The scaling that we observed is an emergent property that is independent from the possible theoretical models used to describe the phase transition. In fact, our scaling analysis indicates that any theoretical model proposed for protein phase separation should predict the critical exponents that we reported. 

      Reviewer #1

      The authors analyse droplet size distributions of multiple protein condensates and fit to a scaling ansatz to highlight that they exhibit features of first-order and second-order phase transitions. While the experimental evidence is solid, the text lacks connection and contextualization to the well-understood expectations from the coupling of percolation and phase separation in protein condensates - a phenomenon that is increasingly gaining consensus amongst the community. The evidence supports the percolation and phase separation model rather than being close to a true critical point in the liquid-gas phase space. Overall, the work is useful to the community.

      We are grateful to the reviewer for these positive comments. We would like to emphasises that our contribution is not to propose a theoretical model, but rather to report a scaling behaviour in the experimentally measured droplet size distributions. The main implication of our work is that any theoretical model should predict the scaling exponents that we derived from the experimental measurements.

      Strengths: 

      The experimental analysis of distinct protein condensates is very well done and the reported exponents/scaling framework provides a clear framework to help the community deconvolve signatures of percolation in condensates. 

      Weaknesses: 

      The principal concern this reviewer has is that the reviewers adopt a framing in this paper to present a discovery of second-order features and connections to criticality - however, they ignore/miss the connections to percolation (a well-understood second-order transition that is expected to play a major role in protein condensates). I believe this needs to be addressed and the paper suitably revised to help connect with these expectations. 

      The scaling that we found is not characteristic standard percolation, since the exponents that we obtained (a=0 and f=1) are different from those of percolation (a=1.19 and f=2.21). This difference indicates that protein phase separation is not in the same universality class of standard percolation. Further studies will be required to understand whether theoretical models based on percolation could predict the observed critical exponents.

      - Protein condensates have been increasingly understood to be described as fluids whose assembly is driven by a connection of density (phase separation, first-order) and connectivity (percolation, second-order) transitions. This has been long known in the polymer community (Flory, Stockmayer, Tanaka, Rubinstein, Semenov, and others) and recently repopularized in the condensate community (by Pappu and Mittag, in particular, amongst others). The authors make no connections to any of these frameworks - which actually seem to be the essence of what they are describing. 

      As mentioned above, our purpose was neither to support an existing theoretical model, nor to propose a new one. Rather, we have reported a scaling behaviour and scaling exponents not noted before. Further studies will be required to establish whether existing theoretical models could account for this scaling behaviour.

      - Percolation theory, which has been around for more than half a century, has clear-cut scaling laws that have essentially similar forms to the ansatz adopted by the authors, and the commonalities/differences are not discussed by the authors - this is essential since this provides a physical basis for their ansatz rather than an arbitrary mathematical formulation. In particular, percolation models connect size distribution exponents to factors like dimensionality, valence, etc. and if these connections can be made with this data, that would be very powerful. 

      The scaling ansatz that we are using is commonly adopted in studies of critical phenomena, and it is not specific to percolation. The scaling exponents depends only on very few attributes like dimensionality, symmetries and if interactions are short or long range. These attributes determine the universality class. As such, scaling does not link with molecular determinants, but can distinguish different classes.

      - The connections between spinodal decomposition and second-order phase transitions are very confusing. Spindal decomposition happens when the barriers for first-order phase transitions are zero and systems can phase separate without crossing nucleation barriers. Further, the "criticality" discussed in the paper is confusing since it more likely refers to a percolation threshold and much less likely to a "critical temperature" (Tc -where spinodal and binodals become identical). I would recommend reframing this argument. 

      We cannot refer to percolation threshold as our model is not readily compatible with it. We elaborated and better explained the differences between these models.

      It's unlikely, in this reviewer's opinion, that the authors are actually discussing a "first-order" liquid-gas critical point - because saturation concentrations of these proteins can be much higher with temperature and the critical point would thus likely be at much higher concentrations (and ofc temperature). Further, the scaling exponents don't fall into that class naturally. However, if the authors disagree, I would appreciate clear quantitative reasons (including through the scaling exponents in that universality class) and be happy to be convinced to change my mind. As provided, the data does not support this model. 

      We have now clarified in the manuscript that we do not discuss the liquid-gas critical point.

      Reviewer #2

      This is a potentially interesting study addressing a possible scale-invariant log-normal characteristic of droplet size distribution in the phase separation behavior of biomolecular condensates. Some of the data presented are valuable and intriguing. However, as it stands, the validity and utility of this study are uncertain because there are serious deficiencies in the execution and presentation of the authors' results. Many of these shortcomings are fundamental, including a lack of clarity in the basic conceptual framework of the study, insufficient justification of the experimental setup, less-than-conclusive experimental evidence, and inadequate discussion of implications of the authors' findings to future experimental and theoretical studies of biomolecular condensates. Accordingly, this reviewer considers that the manuscript should undergo a major revision to address the following. In particular, the discussion should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      We thank the reviewer for the helpful comments. In the revised version of the manuscript we clarified that we aimed to use a well-established tool – the scaling analysis – to study phase transition and applied to the protein condensation process. This approach offers insight into a universal aspect of protein phase separation, and also provides a practical approach to determine the phase boundary. The observed fat-tailed distribution of protein droplet sizes is not what is normally observed in more standard phase separation systems in the subsaturated phase. Our contribution is not to propose a theoretical model, but rather to report the observation of a scaling behaviour. 

      (1) The theoretical analysis in this study is based on experimental data on condensed droplet size distributions for FUS and α-synuclein. The size data for FUS droplet is indirect as it relies on the assumption that FUS droplet diameter is proportional to fluorescence intensity of labeled FUS (page 10 of manuscript), with fluorescence data adopted from a previously published work by another group (Kar et al. & Pappu, ref.27). Because fluorescence of a droplet is expected to be dependent upon the condensed-phase concentration of FUS, this proportional relationship, even if it holds, must also be modulated by FUS concentration in the droplet. Moreover, why should fluorescence be proportional to diameter but not the cross-sectional area or volume of the FUS droplet, which would be more intuitive? These issues should be clarified. A new measure by microscopy is used to determine the size distribution of condensed α-synuclein; but no microscopy image is shown. It is of critical importance that such raw data (for example microscopy images) be presented for the completeness and reproducibility of the experiment because the entire study relies on the soundness of these experimental measurements. 

      As we mentioned in the article, for the scaling analysis, the droplet dimensions could be assessed in 1D (length), 2D (area) or 3D (volume). For the FUS experiments, we used the data as the authors provided in the original publication (PNAS 2022). For alpha-synuclein, we provided the data in the article. 

      (2) Despite the authors' claim of a universal scaling relationship, the log-log scatter plots in Figure 1 (page 15 of the manuscript) exhibit significant deviations from linearity at low protein concentrations (ρ→0). Given this fact, is universal scaling really valid? Discussion of this behavior is conspicuously absent (except the statement that these data points are excluded in the fit). In any case, the possible origins of these deviations should be thoroughly discussed so that the regime of universal scaling can be properly delineated. 

      In general, one would expect the scaling ansatz to be valid close to the phase boundary. It is the feature of the ansatz, that further away from the boundary, deviations are expected because of the decreasing relevance of critical phenomena.

      (3) Droplet size distribution most likely depends on the time duration after the preparation of the sample. For α-synuclein, "liquid droplet size characterisation images were captured 10 minutes post-liquid droplet formation" (page 9 of the manuscript). Why 10 minutes? Have the authors tried imaging at different time points and, if so, do the distributions at different time points remain essentially the same? If they are different, what is the criterion for focusing only on a particular time point? Information related to these questions should be provided. 

      We have now determined the droplet size distribution of alpha-synuclein at different time points, finding that they are not dependent on time within experimental uncertainties (Figure 6 in the revised manuscript).

      (4) At least two well-known mechanisms can lead to the time-dependent distribution of liquid droplet sizes: (i) coalescence of droplets in spatial proximity to form a larger droplet, and (ii) Ostwald ripening, i.e., formation of larger droplets concomitant with the dissolution of smaller droplets without fusion of droplets. The implications of these mechanisms on the authors' droplet size distributions should be addressed. Indeed, maintaining a size distribution against these mechanisms in vivo often requires active suppression [Bressloff, Phys Rev E 101, 042804 (2020)] with possible involvement of chemical reactions [Kirschbaum & Zwicker, J R Soc Interface 18, 20210255 (2021)]. These considerations are central to the basic rationale of this study and therefore should be carefully tackled. 

      These two mechanism of growth are relevant above the critical concentration. Below the critical concentration, which is the regime that we investigated in our work, there is no need of active suppression.

      (5) If coalescence and/or Ostwald ripening do occur, given sufficient time after sample preparation, the condensed phase may become a single large "droplet" or a single liquid layer. Does this occur in the authors' experiments? 

      As we are below the critical concentration, this is unlikely to occur, as indeed supported by the experiments mentioned at point (3). 

      (6) It is unclear whether the authors aim to address the kinetic phenomenon of liquid droplet formation and evolution or equilibrium properties. The two types of phenomena appear to be conflated in the authors' narrative. Clarification is needed. If this work aims to address timeindependent (or infinite-time) equilibrium properties, how are they expected to be related to droplet size distribution, which most likely is time-dependent? 

      Our analysis focuses on the equilibrium properties of the droplet size distribution below the critical concentration, and it should guide the proposal of a theoretical model that explains the emergence of scaling. In the introductory part of our manuscript, we proposed a possible scenario that tries to extend the Flory-Huggins’s theory to predict a scaling behaviour appropriate to a critical transition. Other scenarios are possible, and our result along with further experiments are needed to arrive at a deeper understanding of protein aggregation.

      (7) The relationship between the potentially time-dependent droplet size distribution and equilibrium properties of ρt and ρc (transition and critical concentrations, respectively) should be better spelled out. An added illustrative figure will be helpful. 

      We are addressing equilibrium properties, not kinetic ones. See also the answers to point 6.

      (8) The authors comment that their findings appear to be inconsistent with Flory-Huggins theory because Flory-Huggins "characterizes droplet formation as a consequence of nucleation ..." (page 8 of the manuscript). Here, three issues need detailed clarification: (i) In what way does Flory-Huggins mandate nucleation? (ii) Why are the findings of apparent scale invariance inconsistent with nucleation? (iii) If liquid droplet formations do not arise from nucleation, what physical mechanism(s) is (are) envisioned by the authors to be underpinning the formation of condensed liquid droplets in protein phase separation? 

      We do agree that the Flory-Huggins theory does not mandate nucleation above the spinodal line. However, we are addressing the equilibrium properties below the critical concentration, so the stable phase is the dilute phase, and there is no nucleation.

      (9) Are any of the authors' findings related to finite-system effects of phase separation [see, e.g., Nilsson & Irbäck, Phys Rev E 101, 022413 (2020)]?  

      Our experimental system is macroscopic, so we would not expect finite size effects.

      (10) Since the authors are using their observation of an apparent scale-invariant droplet size distribution to evaluate phase separation theory, it is important to clarify whether their findings provide any constraint on the shape of coexistence curves (phase diagrams). 

      We are only reporting the phenomenological observation of a scaling behaviour, so we may not speculate at this stage on the constraints of the coexistence curves. This is indeed an exciting opportunity for future studies.

      (11) More specifically, do the authors' findings suggest that the phase diagrams predicted by Flory-Huggins are invalid? Or, are they suggesting that even if the phase diagrams predicted by Flory-Huggins are empirically correct (if verified by experimental testing), they are underpinned by a free energy function different from that of Flory-Huggins? It is important to answer this question to clarify the implications of the authors' findings on equilibrium phase behaviors and the falsifiability of the implications. 

      As mentioned above, our main conclusion is that the droplet size distribution follows a scaling behaviour.  Our contribution is not to propose a theoretical model, but rather to propose a scaling behaviour that should be accounted for by existing of future theoretical models.

      (12) How about the implications of the authors' findings on other theories of protein phase separation that are based on interactions that are different from the short spatial range interactions treated by Flory-Huggins? For instance, it has been observed that whereas the Flory-Huggins-predicted phase diagrams always convex upward, phase diagrams for charged intrinsically disordered proteins with long spatial range Coulomb interactions exhibit a region that concave upward [Das et al., Phys Chem Chem Phys 20, 28558-28574 (2018)]. Can information be provided by the authors' findings regarding apparent scale-invariant droplet size distribution on the underlying interaction driving the protein molecules toward phase separation? 

      This is an interesting point for future studies about the type of interactions that give rise to the observed scaling behaviour.

      (13) Table S1 (page 4) and Table S2 (page 7) are mentioned in the text but these tables are not in the submitted files. 

      We have added the Supplementary Tables as well as the source files for the figures.

      (14) The two systems studied (FUS and α-synuclein) have a single intrinsically disordered protein (IDP) component. It is not clear if the authors expect their claimed scaling relation to be applicable to systems with multiple IDP components and if so, why.

      From the data that we have currently analysed, we feel that we may not speculate on this interesting point, leaving it to future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      A limitation of the study is that it does not directly compare the e4ect of inhibiting the PERKATF4 pathway with inhibiting JUN and/or JUN-CHOP double deficient animals. It would also be useful, for the cell survival experiments shown in Figure 1, to examine a longer time point than 14 days to understand the long-term consequence of manipulating the PERK-ATF4 pathway.

      We appreciate that both suggestions are fantastic ideas for future studies but consider them to be beyond the scope of this investigation. 

      Reviewer #2 (Public Review):

      However, the main concern is the overall data quality, which appears to be suboptimal. The transfection e4iciency of AAV2-hSyn1-mTagBFP2-ires-Cre used in this study does not seem highly e4ective, as evidenced by the data presented in Supplementary Figure 1.

      We appreciate the importance of the e;ectiveness of transfection e;iciency of AAV2-hSyn1-mTagBFP2-ires-Cre to the interpretations of our results and acknowledge that the imaging and color schemes used required improvement. We have now validated widespread knockout in RGCs using AAV2-hSyn1-mTagBFP2-ires-Cre, improving the staining and imaging of LSL.tdTomato Cre reporter mice (Figure S1A-B) and using RNAScope to validate the disruption of ATF4 and CHOP, respectively, in the RGCs of ATF4 cKO and CHOP cKO mice (Figure S1C-D). Additional validation of functional knockout of these transcription factors is provided by reduction of RGC-autonomous expression of transcripts that we identified in this study to be injury-regulated in an ATF4-dependent (Chac1, Atf3, Figure 4C-E) or ATF4- and CHOP-dependent manner (Ecel1, Avil, Figure 4C-E and Figure S2D).

      The manuscript also contains several inconsistencies and a mix of methods in data collection, analysis, and interpretation, such as the labeling and quantification of RGCs and the combination of bulk and single-cell sequencing results.

      Regarding the use and comparison of bulk-seq and scRNA-seq data, it is our sense that these innovative approaches will be among the impactful aspects of this study. Numerous transcriptomic studies of the optic nerve crush model exist, though it has been unclear whether major and minor technical di;erences would preclude deriving insights across studies without the expense and time of exact reproduction. One goal of this study was to evaluate the hypothesis that, despite the obvious limitation that RGCs represent fewer than 1% of cells in whole retina bulk transcriptomics approach, the signals amongst top di;erentially expressed genes (DEGs) would be dominated by injury-induced changes within RGCs and that the most robust of these changes would be readily detected across techniques and labs, serving as a cornerstone for interpreting similarities and di;erences in findings. We believe that the results validate this approach. Important insights gained in this study from these cross-study and cross-platform analyses include:

      (1) Genes that we identify in this study as neuronal ATF4-dependent by whole retina transcriptomics include many of the most robust genes expression changes observed across multiple studies that enrich for RGCs and those that only report RGC-autonomous expression changes by scRNA-seq. This observation predicts that many of the ATF4-dependent expression changes that we report are RGC autonomous, which we further validate in this revision by RNAScope.

      (2) Similarly designed whole transcriptomics studies across labs can be remarkably robust for top DEGs, showing striking similarity that allows for meaningful insights and testable hypotheses across di;erent knockout and conditional knockout mice.

      (3) scRNA-seq of RGCs and bulk sequencing of FACS-enriched RGCs, unsurprisingly results in higher sensitivity for injury-induced expression changes, but the high degree of similarity that we demonstrate between the top DEGs from those studies and whole retina transcriptomics studies allows for confident inferences regarding the expected cell autonomy of reported expression changes in this model, using available resources such as the Single Cell Portal, without the expense and technical optimization required for extensive spatial transcriptomics across numerous mouse models.

      Other revisions

      In addition to these updates to address the public reviews, we are grateful for the reviewers’ additional recommendations and provide these further revisions:

      (1) We appreciate the request to clarify with a schematic the di;erences between our study and a previous report (Tian et al., 2022). A second Correction to that study was published in July 2024, resulting in changes to the logFC values used in our original cross-study comparison and adjustments to multiple figures and tables related to the proposed transcriptional programs of ATF4, CHOP, and the other purported core transcription factors. We have therefore updated our Figure S3A-C in accordance with that Correction to better reflect the underlying data of that study. These changes do not alter our original conclusions that: (a) both the whole retina transcriptomics approach of our study and the FACS-enriched RGC approach of that study readily detect the strong upregulation of many known ATF4 target genes after optic nerve crush (Figure S3A); and (b) there are striking di;erences in the ATF4- and CHOP-dependent transcripts suggested by our cKO data and those suggested by the reported gRNA data. Though we had hoped that the Correction would allow us in this revision to diagram those findings and model for comparison to these cKO findings, documenting those changes and their impacts on the proposed model is beyond the scope of this study.

      (2) We agree that the discordance between the gene and protein names for Ddit3/CHOP and Eif2ak3/PERK represents a challenge for clarity, even when gene names are carefully selected when referring to genes or transcripts and protein names when referring to proteins. We have therefore attempted to streamline the naming throughout, using where possible both names.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) I was surprised to see that the Authors have failed to address my major concerns about the paper, which was in the Main text of the Review.

      Previously I wrote: The major weakness of the manuscript is that it is written for a very specialized reader who has a strong background in cerebellar development, making it hard to read for eLife's general audience. It's challenging to follow the logic of some of the experiments as well as to contextualize these findings in the field of cerebellar development.

      This has not been addressed. The manuscript has not been substantively changed and it is still written for a very specialized reader rather than a general reader.

      We appreciate the respected reviewer’s concern and have made substantial revisions throughout the manuscript to address the points. We have simplified the technical language throughout the manuscript and included additional background information, particularly in the introduction and discussion sections, to better orient general readers. Additionally, we have clarified the logical flow of the experiments by incorporating transitional statements and summaries that explain the purpose and outcomes of each experiment (revisions are highlighted in yellow). 

      (2) These two have been addressed, although to be honest, I don't think that the cartoon is particularly helpful for a general audience.

      Thank you for your feedback. We have replaced the cartoon with a revised version that provides more detailed information to clarify and simplify the origins of cerebellar nuclei from the caudal and rostral ends in both Atoh1+/+ and Atoh1-/- mice. We believe this will make the content more clear and informative for the general audience.

      (3) My third recommendation, that they include a section in the Discussion to speculate about what these cells may become in the adult and the existence of multiple cell types with different molecular markers and projection patterns in the nuclei, has also not been addressed.

      We apologize for the oversight in the previous revision. We have now added a detailed discussion in the manuscript that speculates on the potential fate of these newly identified cells in the adult cerebellum, suggesting that they may differentiate into excitatory neurons (highlighted on page 9). In addition, as noted in our previous resubmission, further direct evidence is needed from the early population of SNCA+ cells during E9 to E13. This is an ongoing focus of investigation in our lab, where we are currently using SNCA-GFP mice, part of a project for a PhD student in our lab.

      Reviewer #2 (Recommendations For The Authors):

      One small remaining issue: The methods text re cell counts remains confusing: n=3

      EMBRYOS???

      "To assess the number of OTX2-positive cells, we conducted immunohistochemistry (IHC) labeling on slides containing serial sections from embryonic days 12, 13, 14, and 15 (n=3 EMBRYOS??? at each timepoint)."

      Thank you for this point and we acknowledge that, and we have revised the text in the methods section for clarity. As highlighted on page 11, “The sample size was equal to 9 embryos” and on page 16, “3 embryos were used at each time point”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Fats and lipids serve many important roles in cancers, including serving as important fuels for energy metabolism in cancer cells by being oxidized in the mitochondria. The process of fatty acid oxidation is initiated by the enzyme carnitine palmitoyltransferase 1A (CPT1A), and the function and targetability of CPT1A in cancer metabolism and biology have been heavily investigated. This includes studies that have found important roles for CPT1A in colorectal cancer growth and metastasis.

      In this study, Chen and colleagues use analysis of patient samples and functional interrogation in animal models to examine the role CPT1A plays in colorectal cancer (CRC). The authors find that CPT1A expression is decreased in CRC compared to paired healthy tissue and that lower expression correlates with decreased patient survival over time, suggesting that CPT1A may suppress tumor progression. To functionally interrogate this hypothesis, the authors both use CRISPR to knockout CPT1A in a CRC cell line that expresses CPT1A and overexpress CPT1A in a CRC cell line with low expression. In both systems, increased CPT1A expression decreased cell survival and DNA repair in response to radiation in culture. Further, in xenograft models, CPT1A decreased tumor growth basally and radiotherapy could further decrease tumor growth in CPT1A-expressing tumors. As CRC is often treated with radiotherapy, the authors argue this radiosensitization driven by CPT1A could explain why CPT1A expression correlates with increased patient survival.

      Lastly, Chen and colleagues sought to understand why CPT1A suppresses CRC tumor growth and sensitizes the tumors to radiotherapy in culture. The antioxidant capacity of cells can increase cell survival, so the authors examine antioxidant gene expression and levels in CPT1A-expressing and non-expressing cells. CPT1A expression suppresses the expression of antioxidant metabolism genes and lowers levels of antioxidants. Antioxidant metabolism genes can be regulated by the FOXM1 transcription factor, and the authors find that CPT1A expression regulates FOXM1 levels and that antioxidant gene expression can be partially rescued in CPT1A-expressing CRC cells. This leads the authors to propose the following model: CPT1A expression downregulates FOXM1 (via some yet undescribed mechanism) which then leads to decreased antioxidant capacity in CRC cells, thus suppressing tumor progression and increasing radiosensitivity. This is an interesting model that could explain the suppression of CPT1A expression in CRC, but key tenets of the model are untested and speculative.

      Strengths:

      Analysis of CPT1A in paired CRC tumors and non-tumor tissue using multiple modalities combined with analysis of independent datasets rigorously show that CPT1A is downregulated in CRC tumors at the RNA and protein level.

      The authors use paired cell line model systems where CPT1A is both knocked out and overexpressed in cell lines that endogenously express or repress CPT1A respectively. These complementary model systems increase the rigor of the study.

      The finding that a metabolic enzyme generally thought to support tumor energetics actually is a tumor suppressor in some settings is theoretically quite interesting.

      We would like to thank Reviewer #1 for the positive comments.

      Weaknesses:

      The authors propose that CPT1A expression modulates antioxidant capacity in cells by suppressing FOXM1 and that this pathway alters CRC growth and radiotherapy response. However, key aspects of this model are not tested. The authors do not show that FOXM1 contributes to the regulation of antioxidant levels in CRC cells and tumors or if FOXM1 suppression is key to the inhibition of CRC tumor growth and radiosensitization by CPT1A. Thus, the model the authors propose is speculative and not supported by the existing data.

      We thank the reviewer for the valuable comment. In this study, we employed Western blotting to assess the protein levels of the ROS scavenging enzymes CAT, SOD1, and SOD2 following FOXM1 overexpression. This approach allowed us to evaluate how FOXM1 regulates ROS clearance and mediates cellular radiation resistance. Further in-vivo evidence is needed and will be addressed in future research.

      The authors propose two mechanisms by which CPT1A expression triggers radiosensitization: decreasing DNA repair capacity (Figure 3) and decreasing antioxidant capacity (Figure 5). However, while CPT1A expression does alter these capacities in CRC cells, neither is functionally tested to determine if altered DNA repair or antioxidant capacity (or both) are the reason why CRC cells are more sensitive to radiotherapy or are delayed in causing tumors in vivo. Thus, this aspect of the proposed model is also speculative.

      We thank the reviewer for the valuable comment. In this study, we combined a colony formation assay, multi-target single-hit survival model, comet assay, and Western blotting (for γH2AX) to evaluate DNA damage and repair in cells. Additionally, we employed qPCR, Western blotting, and enzyme activity kits to assess the direct ROS-scavenging activities of the peroxisomal enzymes CAT, SOD1, SOD2, and SOD3.

      The authors find that CPT1A affects radiosensitization in cell culture and assess this in vivo. In vivo, CPT1A expression slows tumor growth even in the absence of radiotherapy, and radiotherapy only proportionally decreases tumor growth to the same extent as it does in CPT1A non-expressing CRC tumors. The authors propose from this data that CPT1A expression also sensitizes tumors to radiotherapy in vivo. However, it is unclear whether CPT1A expression causes radiosensitization in vivo or if CPT1A expression acts as an independent tumor suppressor to which radiotherapy has an additive effect. Additional experiments would be necessary to differentiate between these possibilities.

      We thank the reviewer for the valuable comment. As shown in Figure 4D, in the absence of CPT1A knockdown, radiotherapy reduced the percentage of Ki67-positive cells in the xenograft tumors by 32.9% (approximately 39.6% of the pre-irradiation baseline). In contrast, upon CPT1A knockdown, radiotherapy only led to a 14.5% reduction in the percentage of Ki67-positive cells (approximately 15.6% of the pre-irradiation baseline). Furthermore, as illustrated in Figures 4E and 4F, in the absence of CPT1A overexpression, radiotherapy resulted in a 0.10-g decrease in tumor weight (around 52.5% of the pre-irradiation weight), whereas with CPT1A overexpression, radiotherapy induced a more pronounced 0.12-g reduction in tumor weight (approximately 89.7% of the pre-irradiation weight). Collectively, these findings indicate that CPT1A exhibits a radiosensitising effect. We have incorporated these relevant details in the Results section (Lines 196-201 and 204-208).

      The authors propose in Figure 3 that DNA repair capacity is inhibited in CRC cells by CPT1A expression. However, the gH2AX immunoblots performed in Figure 3H-I that measure DNA repair kinetics are not convincing that CPT1A expression impairs DNA repair kinetics. Separate blots are shown for CPT1A expressing and non-expressing cell lines, not allowing for rigorous comparison of gH2AX levels and resolution as CPT1A expression is modulated.

      We thank the reviewer for the valuable comment. In this study, we also employed a colony formation assay, multi-target single-hit survival model, and comet assay to elucidate the impact of CPT1A on DNA repair capacity. These methods all indicated that DNA repair capacity is inhibited in CRC cells by CPT1A expression.

      There are conflicting studies (PMID: 37977042, 29995871) that suggest that CPT1A is overexpressed in CRC and contributes to tumor progression rather than acting as a tumor suppressor as the authors propose. It would be helpful for readers for the authors to discuss these studies and why there is a discrepancy between them.

      We thank the reviewer for the valuable comment. We have expanded the discussion of these findings in the relevant section of the manuscript (Lines 317-318). We speculated that the differences between our observations and previous reports may be attributable to the inherent heterogeneity of tumor tissues as well as variations in tumor stage.

      Reviewer #2 (Public Review):

      The manuscript by Chen et al. describes how low levels of CPT1A in colorectal cancer (CRC) confer radioresistance by expediting radiation-induced ROS clearance. The authors propose that this mechanism of ROS homeostasis is regulated through FOXM1. CPT1A is known for its role in fatty acid metabolism via beta-oxidation of long-chain fatty acids, making it important in many metabolic disorders and cancers.

      Previous studies have suggested that the upregulation of CPT1A is essential for the tumor-promoting effect in colorectal cancers (CRC) (PMID: 32913185). For example, CPT1A-mediated fatty acid oxidation promotes colorectal cancer cell metastasis (PMID: 2999587), and repression of CPT1A activity renders cancer cells more susceptible to killing by cytotoxic T lymphocytes (PMID: 37722058). Additionally, inhibition of CPT1A-mediated fatty-acid oxidation (FAO) sensitizes nasopharyngeal carcinomas to radiation therapy (PMID: 29721083). While this suggests a tumor-promoting effect for CPT1A, the work by Chen et al. suggests instead a tumor-suppressive function for CPT1A in CRC, specifically that loss or low expression of CPT1A confers radioresistance in CRC. This makes the findings important given that they oppose the previously proposed tumorigenic function of CPT1A. However, the data presented in the manuscript is limited in scope and analysis.

      Major Limitations:

      (1) Analysis of Patient Samples

      - Figure 1D shows that CPT1A levels are significantly lower in COAD and READ compared to normal tissues. It would be beneficial to show whether CPT1A levels are also significantly lower in CRC compared to other tumor types using TCGA data.

      We thank the reviewer for the valuable comment. We assessed the expression levels of CPT1A across all cancer types in the TCGA dataset and found that the abundance of CPT1A in CRC was significantly lower compared to cholangiocarcinoma (CHOL), esophageal carcinoma (ESCA), kidney chromophobe (KICH), acute myeloid leukemia (LAML), and stomach adenocarcinoma (STAD) (Author response image 1).

      Author response image 1.

      The mRNA level of CPT1A across all cancer types in the TCGA dataset.

      - The analysis should include a comparison of closely related CPT1 isoforms (CPT1B and CPT1C) to emphasize the specific importance of CPT1A silencing in CRC.

      We thank the reviewer for the valuable comment. We further examined the mRNA expression levels of the CPT1 isoforms CPT1B and CPT1C in COAD and READ tumor samples and their respective normal tissue counterparts. The results showed that CPT1B was significantly upregulated in READ tumor samples compared to normal tissues. Similarly, CPT1C was significantly overexpressed in both READ and COAD tumor samples relative to their normal tissue controls (Author response image 2).

      Author response image 2.

      The mRNA expression levels of CPT1B and CPT1C in rectal adenocarcinoma (READ) and colon adenocarcinoma (COAD) based on data from the TCGA database. A. CPT1B expression in READ. B. CPT1B expression in COAD. C. CPT1C expression in READ. D. CPT1C expression in COAD.

      - Figure 2 lacks a clear description of how IHC scores were determined and the criteria used to categorize patients into CPT1A-high and CPT1A-low groups. This should be detailed in the text and figure legend.

      We thank the reviewer for the valuable comment. We have provided a detailed description of the methodology used to determine the IHC scores and criteria applied to categorise patients into CPT1A-high and CPT1A-low groups in the Materials and Methods section (Lines 418-426) as well as the legend of Figure 2A.

      - None of Figure 2B or 2C show how many patients were assigned to the CPT1A-low and CPT1A-high groups.

      We thank the reviewer for the valuable comment. We have added the number of patients in the CPT1A-low and CPT1A-high groups to the legends of Figures 2B and 2C.

      (2) Model Selection and Experimental Approaches

      - The authors primarily use CPT1A knockout (KO) HCT116 cells and CPT1A overexpression (OE) SW480 cells for their experiments, which poses major limitations.

      We thank the reviewer for the valuable comment.

      - The genetic backgrounds of the cell lines (e.g., HCT116 being microsatellite instable (MSI) and SW480 not) should be considered as they might influence treatment outcomes. This should be acknowledged as a major limitation.

      We thank the reviewer for the valuable comment. Indeed, the genetic background differences among cell lines represent a significant limitation. We have addressed this issue in the discussion section (Lines 363-365). 

      - Regardless of their CPT1A expression levels, for the experiments with HCT116 and SW480 cells in Figure 3C-F, it would be useful to see whether HCT116 cells can be further sensitized to radiotherapy in overexpression and whether SW480 cells can be desensitized through CPT1A KO.

      We thank the reviewer for the valuable comment. Due to the inherently high levels of CPT1A in the HCT116 cell line, we attempted to perform relevant experiments but were unable to achieve significant overexpression. Similarly, we faced challenges with the SW480 cell line, which has lower levels of CPT1A. We could thus not provide additional insights in this respect.

      - The use of only two CRC cell lines is insufficient to draw broad conclusions. Additional CRC cell lines should be used to validate the findings and account for genetic heterogeneity. The authors should repeat key experiments with additional CRC cell lines to strengthen their conclusions.

      We thank the reviewer for the valuable comment. To address this issue, we used a radiation-resistant variant of the HCT-15 cell line as a new approach to investigate whether CPT1A is associated with cellular radiation sensitivity. We believe that the data obtained from these acquired resistant cell lines are comparable to those from the ordinary cell lines mentioned in the reviewer’s comment.

      (3) Pharmacological Inhibition

      Several studies have reported beneficial outcomes of using CPT1 pharmacological inhibition to limit cancer progression (e.g., PMID: 33528867, PMID: 32198139), including its application in sensitization to radiation therapy (PMID: 30175155). Since the authors argue for the opposite case in CRC, they should show this through pharmacological means such as etomoxir and whether CPT1A inhibition phenocopies their observed genetic KO effect, which would have important implications for using this inhibitor in CRC patients.

      We thank the reviewer for the valuable comment. The referenced literature has indeed attracted our attention. Our research group is concurrently investigating the role of CPT1A in tumor radiotherapy and immunology, utilising CPT1A inhibitors for experimental validation. We look forward to publishing these related studies to further support the conclusions presented in our manuscript.

      (4) Data Representation and Statistical Analysis

      - The relative mRNA expression levels across the seven cell lines (Supplementary Figure 1C) differ greatly from those reported in the DepMap (https://depmap.org/portal/). This discrepancy should be addressed.

      We thank the reviewer for the valuable comment. The observed differences in mRNA levels may be attributable to variations in cell culture density. For subsequent radiation sensitivity experiments, we maintained our cell culture density at approximately 70–80% confluence.

      - The statistical significance of differences in mRNA and protein levels between RT-sensitive and RT-resistant cells should be shown (Supplementary Figure 1C, D).

      As suggested, we have included a statistical analysis of the differences in CPT1A mRNA levels between RT-sensitive and -resistant cells in Figure 3 and Supplementary Figure 1C. However, further analysis revealed no significant difference in CPT1A protein levels between these groups. This was attributed to the high variability in grayscale values observed between the groups.

      Conclusion

      The study offers significant insights into the role of CPT1A in CRC radioresistance, proposing a tumor-suppressive function. However, the scope and depth of the analysis need to be expanded to fully validate these claims. Additional CRC cell lines, pharmacological inhibition studies, and a more detailed analysis of patient samples are essential to strengthen the conclusions.

      We would like to thank Reviewer #2 for the comments.

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the role of CPT1A in developing resistance to radiotherapy in colorectal cancer (CRC). The manuscript is a collection of assays and analyses to identify the mechanism by which CPT1A leads to treatment resistance through increased expression of ROS-scavenging genes facilitated by FOXM1 and provides an argument to counter this role, leading to a reversal of treatment resistance.

      Strengths:

      The article is well written with sound scientific methodology and results. The assays performed are well within the scope of the hypothesis of the study and provide ample evidence for the role of CPT1A in the development of treatment resistance in colorectal cancer. While providing compelling evidence for their argument, the authors have also rightfully provided limitations of their work.

      We would like to thank Reviewer #3 for the positive comments.

      Weaknesses:

      The primary weakness of the study is acknowledged by the authors at the end of the Discussion section of the manuscript. The work heavily relies on bioinformatics and in vitro work with little backing of in vivo and patient data. In terms of animal studies, it is to be noted that the model they have used is nude mice with non-orthotopic, subcutaneous xenograft, which may not be the best recreation of the patient tumor.

      We thank the reviewer for the insightful comment. Our research group is continuing to explore the role of CPT1A in colorectal cancer radiotherapy and immunotherapy. In a new study, we used a C57BL/6 mouse model to conduct in-vivo experiments. Preliminary data suggest that CPT1A confers heightened radiosensitivity to immunocompetent mice. We look forward to the forthcoming publication of this ongoing research project.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript was challenging to read and contained many typographical errors and tangents that were not logically relevant to the logic of the paper. For example, in lines 365-367 the authors talk about peroxisomes being important for redox balance and that they will target peroxisomal pathways. However, the authors do not perform any experiments targeting peroxisomal pathways. So, I found myself quite perplexed. Careful proofreading of the manuscript would improve the utility for readers.

      We thank the reviewer for the insightful comments. We have made several additions throughout the manuscript to include more relevant information and experimental details, thereby improving the manuscript’s logical structure and readability. As described in the text, we used the DCFH-DA probe to measure ROS levels in cells, considering that regulation of intracellular ROS levels is a major function of peroxidases. We examined the transcriptional levels, protein expression, and enzymatic activities of peroxidases such as CAT, SOD1, SOD2, and SOD3 through qPCR, Western blotting, and specific assay kits.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarification and Flow

      Introduction Clarity: The introduction introduces several topics in succession without clearly connecting them. For example, the introduction of FOXM1 on Line 102 lacks clarity in its relationship to the study. Consider discussing these elements only in the discussion section to avoid confusion.

      We thank the reviewer for this insightful comment. We have moved the section on FOXM1 to the discussion to enhance readability (Lines 342-348).

      Explanation for Non-experts: Both the multi-target single-hit survival model and the comet assay require one sentence to explain their principles for non-experts in the field.

      As suggested, we have included brief explanations of the multi-target single-hit survival model and the comet assay in the Materials and Methods section to clarify these concepts to readers not familiar with the subject (Lines 458-460 and 462-465). 

      (2) Specific Text Revisions

      - Line 302: "We transfected the CRISPR/Cas9 lentivirus into HCT 116 ... efficiency of the 2nd site was the highest" - Clarify what is meant by "second site." If you mean the second sgRNA, please use this term.

      As suggested, we have revised ‘2nd’ to ‘second’ (Lines 151 and 152).

      - Lines 358-359: For the results subsection "Low CPT1A levels accelerate post-radiation ROS scavenging," include an introductory sentence, such as: "To study the mechanism of low CPT1A expression in radiotherapy resistance, we conducted differential gene expression analysis between HCT116 CPT1A KO and NC cells."

      As suggested, we have added an introductory sentence in the section titled ‘Low CPT1A Levels Accelerate Post-Radiation ROS Scavenging’ (Lines 215-217).

      - Line 359: "The gene expression heatmap showed high consistency among replicates for both HCT 116-NC and HCT 116-KO cells (Supplementary Figure 3A)." If these are technical replicates performed on the same batch of KO or NC cells, please state this clearly.

      We have added the suggested information to improve clarity (Line 218).

      - Lines 360-362: "With CPT1A knockdown, we found 363 upregulated and 1290 downregulated genes (|log2(fold change)|>1 and P<0.05)." Ensure that the p-value is correct; it seems this should be q-value < 0.05.

      As suggested, we have revised ‘p’ to ‘q’ (Lines 220 and 496).

      - Line 363: Introduce the term "DEGs" as Differentially Expressed Genes in the main text, not just in the Materials and Methods (line 215).

      As suggested, we have introduced the term "DEGs" as Differentially Expressed Genes in the main text (Lines 221-222).

      - Lines 364-365: "Showing that the main enriched pathways were in peroxisomes, cell cycle nucleotide excision repair, and fatty acid degradation (Figure 5A)." The data does not support this statement. Clarify that the listed pathways are AMONG the enriched KEGG pathways.

      As suggested, we have revised the relevant part in the manuscript (Lines 222-224).

      - Line 370: "...following 6 Gy irradiation and 1 h of incubation with DCFH-DA (Figure 5C)." Write out the term DCFH-DA and explain it for non-experts: "a fluorescent redox probe used to detect reactive oxygen species."

      As suggested, we have added a brief explanation to clarify the term for readers not familiar with the subject (Lines 230-231). 

      - Line 444: "CPT1A is an essential tumor suppressor." This statement has not been validated or referenced adequately.

      As suggested, we have removed the sentence to improve clarity.

      - Line 447: Clarify the relevance of the He, Zhang & Xu reference.

      We apologise for the error and have removed the reference.

      (3) Figure Improvements

      - Standardize Graph Labels: Ensure that graph axis labels and numbering are consistent and legible across the manuscript. For example, Figure 1A has large labels, while Figure 1B has much smaller labels. Ensure all graphs, such as 2C and 3G, have readable labels and numbering.

      We thank the reviewer for the insightful comment. We have revised the labels and numbering in Figures 1B, 2C, and 3G.

      - Figure 2B and 2C: Correct the x-axis label from "mouths" to "months."

      We thank the reviewer for this insightful comment. We have revised the labels in Figure 2B and 2C.

      - Figure 3 Legend: Clarify what is meant by "different groups of cell lines" in the legend of Figure 3. Specify whether these are single clones, pooled clones, or mixtures of cells in the text and/or figure legend.

      We thank the reviewer for this insightful comment. We have updated the legend of Figure 3 to enhance clarity.

      - Figures 3H and 3I: Label the blots clearly to indicate which refer to HCT116 NC and KO and which to SW480 RFP and OE.

      We thank the reviewer for this insightful comment. We have revised the labels in Figure 3H and 3I.

      - Supplementary Figure 2A: Describe the terms F and W in the legend.

      We thank the reviewer for this insightful comment. 'F' denotes fraction and 'W' denotes week. We have updated the legend of Figure 3 and Figure 3-figure supplement 2 to improve clarity.

      - Supplementary Data: Consider moving the data described in Supplementary Figure 2 to the main figures as it is among the most convincing data in the paper.

      We thank the reviewer for this insightful comment. We have decided to retain this figure at its current position, as we believe the data presented provide complementary evidence supporting the conclusion discussed earlier.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The paper by Chen et al describes the role of neuronal themo-TRPV3 channels in the firing of cortical neurons at a fever temperature range. The authors began by demonstrating that exposure to infrared light increasing ambient temperature causes body temperature to rise to a fever level above 38{degree sign}C. Subsequently, they showed that at the fever temperature of 39{degree sign}C, the spike threshold (ST) increased in both populations (P12-14 and P7-8) of cortical excitatory pyramidal neurons (PNs). However, the spike number only decreased in P7-8 PNs, while it remained stable in P12-14 PNs at 39 degrees centigrade. In addition, the fever temperature also reduced the late peak postsynaptic potential (PSP) in P12-14 PNs. The authors further characterized the firing properties of cortical P12-14 PNs, identifying two types: STAY PNs that retained spiking at 30{degree sign}C, 36{degree sign}C, and 39{degree sign}C, and STOP PNs that stopped spiking upon temperature change. They further extended their analysis and characterization to striatal medium spiny neurons (MSNs) and found that STAY MSNs and PNs shared the same ST temperature sensitivity. Using small molecule tools, they further identified that themo-TRPV3 currents in cortical PNs increased in response to temperature elevation, but not TRPV4 currents. The authors concluded that during fever, neuronal firing stability is largely maintained by sensory STAY PNs and MSNs that express functional TRPV3 channels. Overall, this study is well designed and executed with substantial controls, some interesting findings, and quality of data. Here are some specific comments: 

      (1) Could the authors discuss, or is there any evidence of, changes in TRPV3 expression levels in the brain during the postnatal 1-4 week age range in mice? 

      To our knowledge, no published studies have documented changes in TRPV3 expression levels in the brain during the 1st to 4th postnatal weeks in mice. Research on TRPV3 expression in the mouse brain has primarily involved RT-PCR analysis of RNA from dissociated tissue in adult mice (Jang et al., 2012; Kumar et al., 2018), largely due to the scarcity of effective antibodies for brain tissue sections at the time of publication. Furthermore, the Allen Brain Atlas lacks data on TRPV3 expression in the developing or postnatal brain. To address this gap, we plan to examine TRPV3 expression at P7-8, P12-13, and P20-23 as part of our manuscript revision.

      (2) Are there any differential differences in TRPV3 expression patterns that could explain the different firing properties in response to fever temperature between the STAY- and STOP neurons? 

      This is an excellent question and one we plan to explore in the future by developing reporter mice or viral tools to monitor the activity of cells with endogenous TRPV3 expression. To our knowledge, these tools do not currently exist. Creating them will be challenging, as it requires identifying promoters that accurately reflect endogenous TRPV3 expression. We have not yet quantified TRPV3 expression in STOP and STAY neurons; however, our analysis of evoked spiking activity at 30, 36, and 39°C suggests that TRPV3 expression may mark a population of pyramidal neurons that tend to STAY spiking as temperatures increase. To investigate this further, we are considering patch-seq for TRPV3 expression on recorded neurons. This is a complex experiment, as it requires recording activity at three different temperatures and subsequently collecting the cell contents. While success is not guaranteed, we are committed to attempting these experiments as part of our revisions.

      (3) TRPV3 and TRPV4 can co-assemble to form heterotetrameric channels with distinct functional properties. Do STOP neurons exhibit any firing behaviors that could be attributed to the variable TRPV3/4 assembly ratio? 

      There is some evidence that TRPV3 and TRPV4 proteins can physically associate in HEK293 cells and native skin tissues (Hu et al., 2022).  TRPV3 and TRPV4 are both expressed in the cortex (Kumar et al., 2018), but it remains unclear whether they are co-expressed and co-assembled to form heteromeric channels in cortical excitatory  pyramidal neurons.  Examination of the I-V curve from HEK cells co-expressing TRPV3/4 heteromeric channels shows enhanced current at negative membrane potentials (Hu et al., 2022).  

      Currently, we cannot characterize cells as STOP or STAY and measure TRPV3 or TRPV4 currents simultaneously, as this would require different experimental setups and internal solutions. Additionally, the protocol involves a sequence of recordings at 30, 36, and 39°C, followed by cooling back to 30°C and re-heating to each temperature. Cells undergoing such a protocol will likely not survive till the end.

      In our recordings of TRPV3 currents—which likely include both STOP and STAY cells—we do not observe a significant current at negative voltages, suggesting that TRPV3/4 heteromeric channels may either be absent or underrepresented, at least at a 1:1 ratio. However, the possibility that TRPV3/4 heteromeric channels could define the STOP cell population is intriguing and plausible.

      (4) In Figure 7, have the authors observed an increase of TRPV3 currents in MSNs in response to temperature elevation? 

      We have not recorded TRPV3 currents in MSNs in response to elevated temperatures.

      (5) Is there any evidence of a relationship between TRPV3 expression levels in D2+ MSNs and degeneration of dopamine-producing neurons? 

      This is an interesting question, though it falls outside our current research focus in the lab. A PubMed search yields no results connecting the terms TRPV3, MSNs, and degeneration. However, gain-of-function mutations in TRPV4 channel activity have been implicated in motor neuron degeneration (Sullivan et al., 2024) and axon degeneration (Woolums et al., 2020). Similarly, TRPV1 activation has been linked to developmental axon degeneration (Johnstone et al., 2019), while TRPV3 blockade has shown neuroprotective effects in models of cerebral ischemia/reperfusion injury in mice (Chen et al., 2022).

      The link between TRPV activation and cell degeneration, however, may not be straightforward. For instance, TRPV1 loss has been shown to accelerate stress-induced degradation of axonal transport from retinal ganglion cells to the superior colliculus and to cause degeneration of axons in the optic nerve (Ward et al., 2014). Meanwhile, TRPV1 activation by capsaicin preserves the survival and function of nigrostriatal dopamine neurons in the MPTP mouse model of Parkinson's disease (Chung et al., 2017).

      (6) Does fever range temperature alter the expressions of other neuronal Kv channels known to regulate the firing threshold? 

      This is an active line of investigation in our lab. The results of ongoing experiments will provide further insight into this question.

      Reviewer #2 (Public review): 

      Summary: 

      The authors study the excitability of layer 2/3 pyramidal neurons in response to layer four stimulation at temperatures ranging from 30 to 39 Celsius in P7-8, P12-P14, and P22-P24 animals. They also measure brain temperature and spiking in vivo in response to externally applied heat. Some pyramidal neurons continue to fire action potentials in response to stimulation at 39 C and are called stay neurons. Stay neurons have unique properties aided by TRPV3 channel expression. 

      Strengths: 

      The authors use various techniques and assemble large amounts of data. 

      Weaknesses: 

      (1) No hyperthermia-induced seizures were recorded in the study. 

      The goal of this manuscript is to uncover the age-related physiological changes that enable the brain to retain function at fever temperatures. These changes may potentially explain why most children do not experience febrile seizures or why, in the rare cases when they do occur, the most prominent window of susceptibility is between 2-5 years of age (Shinnar and O’Dell, 2004), as this may coincide with the window during which these developmental changes are normally occurring. While it is possible that impairments in these mechanisms could result in febrile seizures, another possibility is that neural activity may fall below the level required to maintain normal function.

      (2) Febrile seizures in humans are age-specific, extending from 6 months to 6 years. While translating to rodents is challenging, according to published literature (see Baram), rodents aged P11-16 experience seizures upon exposure to hyperthermia. The rationale for publishing data on P7-8 and P22-24 animals, which are outside this age window, must be clearly explained to address a potential weakness in the study. 

      This manuscript focuses on identifying the age-related physiological changes that enable the brain to retain function at fever temperatures. To this end, we examine two age periods flanking the putative window of susceptibility (P12-14), specifically an earlier timepoint (P7-8) and a later timepoint (P20-23). The inclusion of these time points also serves as a negative control, allowing us to determine whether the changes we observe in the proposed window of susceptibility are unique to this period. We believe that including these windows ensures a thorough and objective scientific approach.

      (3) Authors evoked responses from layer 4 and recorded postsynaptic potentials, which then caused action potentials in layer 2/3 neurons in the current clamp. The post-synaptic potentials are exquisitely temperature-sensitive, as the authors demonstrate in Figures 3 B and 7D. Note markedly altered decay of synaptic potentials with rising temperature in these traces. The altered decays will likely change the activation and inactivation of voltage-gated ion channels, adjusting the action potential threshold. 

      In Figure 4B, we surmised that the temperature-induced reductions in inhibition and the subsequent loss of the late PSP primarily contribute to the altered decay of the synaptic potentials.

      (4) The data weakly supports the claim that the E-I balance is unchanged at higher temperatures. Synaptic transmission is exquisitely temperature-sensitive due to the many proteins and enzymes involved. A comprehensive analysis of spontaneous synaptic current amplitude, decay, and frequency is crucial to fully understand the effects of temperature on synaptic transmission. 

      Thank you for the opportunity to provide clarification. It was not stated, nor did we intend to imply, that in general, E-I balance is unchanged at higher temperatures. Please see the excerpt from the manuscript below. The statements specifically referred to observations made for experiments conducted during the P20-26 age range for cortical pyramidal neurons. We have a parallel line of investigation exploring the differential susceptibility of E-I balance based on age and temperature. Additionally, our measurements focus on evoked activity, rather than spontaneous activity, as these events are more likely linked to the physiological changes underlying behavior in the sensory cortex.

      “As both excitatory and inhibitory PNs that stay spiking increase their firing rates (Figure 5B) and considering that some neurons within the network are inactive throughout or stop spiking, it is plausible that these events are calibrated such that despite temperature increases, the excitatory to inhibitory (E-I) balance within the circuit may remain relatively unchanged. Indeed, recordings of L4-evoked excitatory and inhibitory postsynaptic currents (respectively EPSCs and IPSCs) in wildtype L2/3 excitatory PNs in S1 cortex, where inhibition is largely mediated by the parvalbumin positive (PV) interneurons, showed that E-I balance (defined as E/E+I, the ratio of the excitatory current to the total current) remained unchanged as temperature increased from 36 to 39°C (Figure 5E).”

      (5) It is unclear how the temperature sensitivity of medium spiny neurons is relevant to febrile seizures. Furthermore, the most relevant neurons are hippocampal neurons since the best evidence from human and rodent studies is that febrile seizures involve the hippocampus. 

      Thank you for the opportunity to clarify. Our goal was not to establish a link between medium spiny neuron (MSN) function and febrile seizures. The manuscript's focus is on identifying age-related physiological changes that enable supragranular cortical cells in the brain to retain function at fever temperatures. MSNs were selected for mechanistic comparison in this study because they represent a non-pyramidal, non-excitatory neuronal subtype, allowing us to assess whether the physiological changes observed in L2/3 excitatory pyramidal neurons are unique to these cells.

      (6) TRP3V3 data would be convincing if the knockout animals did not have febrile seizures. 

      Could you kindly provide the reference indicating that TRPV3 KO mice have seizures? Unfortunately, we were unable to locate this reference. It is important to distinguish febrile seizures, which occur within the range of physiological body temperatures (~ 38 to 40°C), from seizures resulting from heat stroke, a severe form of hyperthermia occuring when body temperature exceeds 40.0 °C. Mechanistically, these may represent different phenomena, as the latter is typically associated with widespread protein denaturation and cell death, whereas febrile seizures are usually non-lethal.  Additionally, TRPV3 is located on chromosome 17p13.2, a region not currently associated with seizure susceptibility.

      Reviewer #3 (Public review): 

      Summary: 

      This important study combines in vitro and in vivo recording to determine how the firing of cortical and striatal neurons changes during a fever range temperature rise (37-40 oC). The authors found that certain neurons will start, stop, or maintain firing during these body temperature changes. The authors further suggested that the TRPV3 channel plays a role in maintaining cortical activity during fever. 

      Strengths: 

      The topic of how the firing pattern of neurons changes during fever is unique and interesting. The authors carefully used in vitro electrophysiology assays to study this interesting topic. 

      Weaknesses: 

      (1) In vivo recording is a strength of this study. However, data from in vivo recording is only shown in Figures 5A,B. This reviewer suggests the authors further expand on the analysis of the in vivo Neuropixels recording. For example, to show single spike waveforms and raster plots to provide more information on the recording. The authors can also separate the recording based on brain regions (cortex vs striatum) using the depth of the probe as a landmark to study the specific firing of cortical neurons and striatal neurons. It is also possible to use published parameters to separate the recording based on spike waveform to identify regular principal neurons vs fast-spiking interneurons. Since the authors studied E/I balance in brain slices, it would be very interesting to see whether the "E/I balance" based on the firing of excitatory neurons vs fast-spiking interneurons might be changed or not in the in vivo condition. 

      As requested, in the revised manuscript, we will include examples of single spike waveforms and raster plots for the in vivo recordings. Please note that all recordings were conducted in the cortex, not the striatum. To clarify, we used published parameters to separate the recordings based on spike waveform, which allowed us to identify regular principal neurons and fast-spiking interneurons. The paragraph below from the methods section describes this procedure.

      “ Following manual curation, based on their spike waveform duration, the selected single units (n= 633) were separated into putative inhibitory interneurons and excitatory principal cells (Barthóet al., 2004). The spike duration was calculated as the time difference between the trough and the subsequent waveform peak of the mean filtered (300 – 6000 Hz bandpassed) spike waveform. Durations of extracellularly recorded spikes showed a bimodal distribution (Hartigan’s dip test; p < 0.001) characteristic of the neocortex with shorter durations corresponding to putative interneurons (narrow spikes) and longer durations to putative principal cells (wide spikes). Next, k-means clustering was used to separate the single units into these two groups, which resulted in 140 interneurons (spike duration < 0.6 ms) and 493 principal cells (spike duration > 0.6 ms), corresponding to a typical 22% - 78% (interneuron – principal) cell ratio”.

      In vivo patching to record extracellular and inhibitory responses at 36°C and then waiting 10 minutes to record again at 39°C would be an extremely challenging experiment. Due to the high difficulty and expected very low yield, these experiments will not be pursued for the revision studies.

      (2) The author should propose a potential mechanism for how TRPV3 helps to maintain cortical activity during fever. Would calcium influx-mediated change of membrane potential be the possible reason? Making a summary figure to put all the findings into perspective and propose a possible mechanism would also be appreciated. 

      Thank you for your helpful suggestions. In response to your recommendation, we will include a summary figure detailing the hypothesis currently described in the discussion section of the manuscript. The excerpt from the discussion is included below.

      “Although, TRPV3 channels are cation-nonselective, they exhibit high permeability to Ca2+ (Ca²⁺ > Na⁺ ≈ K⁺ ≈ Cs⁺) with permeability ratios (relative to Na+) of 12.1, 0.9, 0.9, 0.9 (Xu et al., 2002). Opening of TRPV3 channels activates a nonselective cationic conductance and elevates membrane depolarization, which can increase the likelihood of generating action potentials. Indeed, our observations of a loss of the temperature-induced increases in the PSP with TRPV3 blockade are consistent with a reduction in membrane depolarization. In S1 cortical circuits at P12-14, STAY PNs appear to rely on a temperature-dependent activity mechanism, where depolarization levels (mediated by higher excitatory input and lower inhibitory input) are scaled to match the cell’s ST. Thus, an inability to increase PSPs with temperature elevations prevents PNs from reaching ST, so they cease spiking.”

      (3) The author studied P7-8, P12-14, and P20-26 mice. How do these ages correspond to the human ages? it would be nice to provide a comparison to help the reader understand the context better.

      Ideally, the mouse-human age comparison would depend on the specific process being studied. Please note that these periods are described in the introduction of the manuscript. The relevant excerpt is included below. Let us know if you need any additional modifications to this description.

      “Using wildtype mice across three postnatal developmental periods—postnatal day (P)7-8 (neonatal/early), P12-14 (infancy/mid), and P20-26 (juvenile/late)—we investigated the electrophysiological properties, ex vivo and in vivo, that enable excitatory pyramidal neurons (PNs) neurons in mouse primary somatosensory (S1) cortex to remain active during temperature increases from 30°C (standard in electrophysiology studies) to 36°C (physiological temperature), and then to 39°C (fever-range).”

    1. Author response:

      eLife Assessment

      This important study describes a computational tool termed FliSimBA (Fluorescence Lifetime Simulation for Biological Applications), which uses simulations to rigorously assess experimental limitations in fluorescence lifetime imaging microscopy (FLIM), including diverse noise factors, hardware effects, and sensor expression levels. The evidence from simulation and experimental measurements supporting the usefulness of FlimSimBA is solid. The authors may improve the application of the tool to a wide range of biological samples by providing the simulation package, currently in MATLB, in other common languages such as Python, and having better descriptions of the fitting algorithm and model assumptions. The work will interest scientists who wish to perform quantitative FLIM imaging for cells and tissues.

      We thank the editors and reviewers for the constructive feedback. We plan to provide the FLiSimBA simulation package in Python in addition to Matlab. We will also describe in more detail in the Results section our fitting method. Furthermore, we will explain more clearly in the text that our simulation package makes almost no model assumptions, and features flexibility and adaptability so that it can be used for any fluorescence lifetime measurements. We will clearly outline what are the specific examples we use for our case studies, and how users can input their own values based on the specific sensors, autofluorescence, and hardware they use.

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Ma et al. aimed to determine previously uncharacterized contributions of tissue autofluorescence, detector afterpulse, and background noise on fluorescence lifetime measurement interpretations. They introduce a computational framework they named "Fluorescence Lifetime Simulation for Biological Applications (FLiSimBA)" to model experimental limitations in Fluorescence Lifetime Imaging Microscopy (FLIM) and determine parameters for achieving multiplexed imaging of dynamic biosensors using lifetime and intensity. By quantitatively defining sensor photon effects on signal-to-noise in either fitting or averaging methods of determining lifetime, the authors contradict any claims of FLIM sensor expression insensitivity to fluorescence lifetime and highlight how these artifacts occur differently depending on the analysis method. Finally, the authors quantify how statistically meaningful experiments using multiplexed imaging could be achieved.

      A major strength of the study is the effort to present results in a clear and understandable way given that most researchers do not think about these factors on a day-to-day basis. The model code is available and written in Matlab, which should make it readily accessible, although a version in other common languages such as Python might help with dissemination in the community. One potential weakness is that the model uses parameters that are determined in a specific way by the authors, and it is not clear how vastly other biological tissue and microscope setups may differ from the values used by the authors.

      Overall, the authors achieved their aims of demonstrating how common factors (autofluorescence, background, and sensor expression) will affect lifetime measurements and they present a clear strategy for understanding how sensor expression may confound results if not properly considered. This work should bring to awareness an issue that new users of lifetime biosensors may not be aware of and that experts, while aware, have not quantitatively determined the conditions where these issues arise. This work will also point to future directions for improving experiments using fluorescence lifetime biosensors and the development of new sensors with more favorable properties.

      We appreciate the comments and helpful suggestions. We plan to present FLiSimBA simulation code in Python in addition to Matlab to make it more accessible to the community.

      One of the advantages of FLiSimBA is that the simulation package is flexible and adaptable, allowing users to input parameters based on the specific sensors, hardware, and autofluorescence measurements for their biological and optical systems. We used parameters based on one FRET-based sensor, measured autofluorescence from mouse tissue, and measured dark count/after pulse of our specific GaAsP PMT in this manuscript as examples. We will emphasize this advantage and further clarify how these parameters can be adapted to diverse tissues, imaging systems, and sensors based on individual users in our revision.

      Reviewer #2 (Public review):

      Summary:

      By using simulations of common signal artefacts introduced by acquisition hardware and the sample itself, the authors are able to demonstrate methods to estimate their influence on the estimated lifetime, and lifetime proportions, when using signal fitting for fluorescence lifetime imaging.

      Strengths:

      They consider a range of effects such as after-pulsing and background signal, and present a range of situations that are relevant to many experimental situations.

      Weaknesses:

      A weakness is that they do not present enough detail on the fitting method that they used to estimate lifetimes and proportions. The method used will influence the results significantly. They seem to only use the "empirical lifetime" which is not a state of the art algorithm. The method used to deconvolve two multiplexed exponential signals is not given.

      We appreciate the comments and constructive feedback and will more clearly describe the fitting methods in our revision.

      Two metrics are currently used to estimate lifetime in our paper, which are currently described in the Methods section ‘Experimental data collection, parameter determination, and simulation’ and ‘FLIM analysis’: (1) fitted P1: we described how lifetime histograms were fitted to Equation 2 with the Gauss-Newton nonlinear least-square fitting algorithm and the fitted P1 was used as lifetime estimation; (2) empirical lifetime, defined by Equation 5. These two metrics were used for the following reasons: (1) when the exponential decay equation of a sensor is known (for example, the FRET-based PKA activity sensor FLIM-AKAR can be described as a double exponential equation), fitted coefficients for each exponential component provide a robust way for lifetime estimate that is less sensitive to noise and background signals; (2) when the biophysical properties of sensors are unknown, or when the sensors cannot be easily described with single or double exponential equations, empirical lifetime (i.e. average lifetime values) provides an unbiased way to quantify fluorescence lifetime without assumptions of underlying models to describe sensor lifetime.

      To deconvolve two multiplexed exponential signals (Fig. 8), histograms were fitted to Equation 2 with the Gauss-Newton nonlinear least-square fitting algorithm, as described in Methods section ‘Simulation and analysis of multiplexed imaging with fluorescence intensity and lifetime data’.

      Considering the importance of these methodological details for evaluating the conclusions of this study, and the importance of appreciating the advantages and limitations of different methods of lifetime estimates (e.g. Figure 7), we will move the description of the fitting method to estimate P1 and the method of calculating empirical lifetime from Methods to Results, and will further clarify the rationale of using these different methods of lifetime estimates.

      Reviewer #3 (Public review):

      Summary:

      This study presents a useful computational tool, termed FLiSimBA. The MATLAB-based FLiSimBA simulations allow users to examine the effects of various noise factors (such as autofluorescence, afterpulse of the photomultiplier tube detector, and other background signals) and varying sensor expression levels. Under the conditions explored, the simulations unveiled how these factors affect the observed lifetime measurements, thereby providing useful guidelines for experimental designs. Further simulations with two distinct fluorophores uncovered conditions in which two different lifetime signals could be distinguished, indicating multiplexed dynamic imaging may be possible.

      Strengths:

      The simulations and their analyses were done systematically and rigorously. FliSimba can be useful for guiding and validating fluorescence lifetime imaging studies. The simulations could define useful parameters such as the minimum number of photons required to detect a specific lifetime, how sensor protein expression level may affect the lifetime data, the conditions under which the lifetime would be insensitive to the sensor expression levels, and whether certain multiplexing could be feasible.

      Weaknesses:

      The analyses have relied on a key premise that the fluorescence lifetime in the system can be described as two-component discrete exponential decay. This means that the experimenter should ensure that this is the right model for their fluorophores a priori and should keep in mind that the fluorescence lifetime of the fluorophores may not be perfectly described by a two-component discrete exponential (for which alternative algorithms have been implemented: e.g., Steinbach, P. J. Anal. Biochem. 427, 102-105, (2012)). In this regard, I also couldn't find how good the fits were for each simulation and experimental data to the given fitting equation (Equation 2, for example, for Figure 2C data).

      We thank the reviewer for the constructive feedback. We agree that the FLiSimBA users should ensure that the right decay equations are used to describe the fluorescent sensors. In this study, we used a FRET-based PKA sensor FLIM-AKAR to provide a proof-of-principle demonstration of FLiSimBA usage. The donor fluorophore of FLIM-AKAR, truncated monomeric enhanced GFP, follows a single exponential decay. FLIM-AKAR, a FRET-based sensor, follows a double exponential decay. The time constants of the two exponential components were determined previously (Chen, et al, Frontiers in pharmacology (2014)).  Thus, a double exponential decay equation with known τ1 and τ2 (Equation 1) was used for both simulation and fitting. In our revision, we will refer to our prior study characterizing the double exponential decay model of FLIM-AKAR. We will also emphasize the importance of using the right decay equations, strategies to estimate sensor decays, and how the flexibility of FLiSimBA allows users to input different forms of models to describe their specific sensor histograms. We will additionally provide data showing the goodness of fit for both simulated data and experimental data.

      Also, in Figure 2C, the 'sensor only' simulation without accounting for autofluorescence (as seen in Sensor + autoF) or afterpulse and background fluorescence (as seen in Final simulated data) seems to recapitulate the experimental data reasonably well. So, at least in this particular case where experimental data is limited by its broad spread with limited data points, being able to incorporate the additional noise factors into the simulation tool didn't seem to matter too much.

      We agree that in Figure 2C the contributions from autofluorescence, afterpulse, and background signals are small, because sensor photon count is high here. As seen in Figure 2B, when sensor photon counts are higher, the contributions from these other factors become less pronounced. The simulated data in Figure 2C were based on high photon counts because the simulated P1 value was determined by fitting experimental data. To achieve reasonable fitting with minimal interference from autofluorescence, afterpulse, and background signals, we used experimental data with high sensor expression. We will clarify these details in our revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of revisions

      Title

      We have changed the title of the manuscript to “Chromatin endogenous cleavage provides a global view of yeast RNA polymerase II transcription kinetics”.

      Text

      Additional discussion of the patterns for elongation factors added (detailed below).

      Small text changes throughout, as mentioned in the detailed response below.

      Figures

      Updated legend-image in Figure 2F to reflect correct colors

      Added Figure 2 – supplement 1F – RNAPII enrichment with shorter promoter dwell times

      Added Figure 2 - supplement 2 with ChIP-seq outcomes (and text legend)

      Removed gene numbers in Figure 5C and put them in the legend.

      Substituted Med1 and Med8 ChEC over Rap1 sites in Figure 5F.

      Moved kin28-is growth inhibition to Figure 5 – Supplement 1.

      Substituted a new panel overlaying the RNAPII enrichment over UASs or promoters for all three strains in Figure 7D.

      Improved the labeling and legend of Figure 7E

      Methods

      Added ChIP-seq performed to confirm that the MNase fusion proteins are able to produce the expected pattern for ChIP.

      Point-by-point response to reviewers’ comments

      Reviewer 1:

      (1) Extending this work to elongation factors Ctk1 and Spt5 unexpectedly give strong signals near the PIC location and little signals over the coding region. This, and mapping CTD S2 and S5 phosphorylation by ChEC suggests to me that, for some reason, ChEC isn't optimal for detecting components of the elongation complex over coding regions. 

      (3) mapping the elongation factors Spt5 and Ctk1 by ChEC gives unexpected results as the signals over the coding sequences appear weak but unexpectedly strong at promoters and terminators. It would be helpful if the authors could comment on reasons why ChEC may not work well with elongation factors. For example, could this be something to do with the speed of Pol elongation and/or the chromatin structure of coding sequences such that coding sequence DNA is less accessible to MNase cleavage? 

      (7) The mintbodys are an interesting attempt to measure Pol II CTD modifications during elongation but give unexpected results as the signals in the coding region are lower than at promoters and terminators. It seems like ChIP is still a much better option for elongation factors unless I'm missing something. 

      We agree with the reviewer that this is a point that could confuse the reader.  Therefore, we have devoted two additional paragraphs to possible interpretations of our data in the Discussion:

      ChEC with factors involved in elongation (Ctk1, Spt5, Ser2p-RNAPII), when normalized to total RNAPII, showed greater enrichment over the CDS (Figure 3G), as expected. However, it is surprising that we also observed clear enrichment of these factors at promoters (e.g. Figure 3A, E & F). The association of elongation factors with the promoter seems to be biologically relevant. Changes in transcription correlate with changes in ChEC enrichment for these factors and modifications (Figure 4C). Blocking initiation by inhibiting TFIIH kinase led to a reduction of Ser5p RNAPII and Ser2p RNAPII over both the promoter and the transcribed region (Figure 5G). This suggests either that the true signal of these factors over transcribed regions is less evident by ChEC than by ChIP or that ChEC can reveal interactions of elongation factors at early stages of transcription that are missed by ChIP. The expectations for enrichment of elongation factors and phosphorylated CTD are largely based on ChIP data. Because ChIP fails to capture RNAPII enrichment at UASs and promoters, it is possible that ChIP also fails to capture promoter interaction of factors involved in elongation as well.

      Factors important for elongation can also function at the promoter. For example, Ctk1 is required for the dissociation of basal transcription factors from RNAPII at the promoter (Ahn et al., 2009). Transcriptional induction leads to increases in Ctk1 ChEC enrichment both over the promoter and over the 3’ end of the transcribed region (Figure 4C). Dynamics of Spt4/5 association with RNAPII from in vitro imaging (Rosen et al., 2020) indicate that the majority of Spt4/5 binding to RNAPII does not lead to elongation; Spt4/5 frequently dissociates from DNA-bound RNAPII. Association of Spt4/5 with RNAPII may represent a slow, inefficient step in the transition to productive elongation. If so, then ChEC-seq2 may capture transient Spt4/5 interactions that occur prior to productive elongation, producing enrichment of Spt5 at the promoter.

      (2) Finally, the role of nuclear pore binding by Gcn4 is explored, although the results do not seem convincing (10) In Figure 7, it's not convincing to me that ChEC is revealing the reason for the transcriptional defect in the Gcn4 PD mutant. The plots in panel D look nearly the same and I don't follow the authors' description of the differences stated in the text. In panel A, replotting the data in some other way might make the transcriptional differences between WT and Gcn4 PD mutants more obvious. 

      The phenotype of the gcn4-pd mutant is a quantitative decrease in transcription and this leads to a quantitative decrease, rather than qualitative loss, of RNA polymerase II over the promoter, without impacting the association of RNA polymerase II over the UAS region. This effect is small but statistically significant (p = 4e5). We have changed the title of this section of the manuscript to “ChEC-seq2 suggests a role for the NPC in stabilizing promoter association of RNAPII”. Also, to make comparison clearer, we have plotted the data together in the revised figure (Figure 7D).

      The magnitude of the decrease is not large, but we would highlight that is almost as large as that produced by inhibiting the Kin28 kinase (Figure 5H). Because the promoter-bound RNAPII is poorly captured by ChIP, this effect might be difficult to observe by techniques other than ChEC. Obviously, more mechanistic studies will need to be performed to fully understand this phenotype, but this result supports a role for the interaction with the nuclear pore complex in either enhancing the transfer of RNA polymerase II from the enhancer to the promoter or in preventing its dissociation from the promoter.

      I think that the related methods cut&run/cut&tag have been used to map elongating pol II. The authors should summarize what is known from this approach in the introduction and/or discussion. 

      CUT&RUN has been used to map RNAPII in mammals, but we are not aware of reports in S. cerevisiae.  Work from the Henikoff Lab in yeast mapped transcription factors and histone modifications (PMIDs 28079019 and 31232687).  A report using CUT&RUN in a human cell line reported a promoter-5’ bias of RNAPII that appeared to be dependent on fragment length (PMID 33070289). Regardless, the report highlights a key distinction between yeast and other eukaryotes: paused RNAPII. Indeed, paused RNAPII dominates ChIP-seq tracks in metazoans, and so we are hesitant to speculate between CUT&RUN in other species vs. ChEC-seq2 in S. cerevisiae

      Are the Rpb1, Rpb3, TFIIA, and TFIIE cleavage patterns expected based on the known structure of the PIC (Figures 2C, E)? 

      Rpb1 and 3 show peaks at approximately -17 and +34 with respect to TATA. TFIIA (Toa2) shows peaks at -12 and + 12.  And TFIIE (Tfa1) shows a peak around +34 (Figure 2C & E):

      As shown in the supplementary movie (based on the cMed-PIC structure; PDB #5OQM; Schilbach et al., 2017), upon binding to TBP/TFIID, TFIIA would be expected to cleave slightly upstream and downstream of the protected TATA (-12 and +12), while TFIIE binds downstream after the +12 site is protected and would be closest to the +34 unprotected site (to the right in the image below). RNAPII, which binds the fully assembled PIC, should be able to access either the upstream site (-12) or the downstream site (+34). Rpb1’s unstructured carboxy terminal domain, to which MNase is fused, would give it maximum flexibility, which likely explains why Rpb1 cleaves both at -12 and +34, with a preference for -12. Rpb3 also cleaves both sites, but without an obvious preference. 

      Author response image 1.

      Author response image 2.

      cleavage at -12, +12 and +34

      Author response image 3.

      Highlighted sites corresponding to the peaks in TFIIA assembled with TBP:

      Author response image 4.

      The complete PIC, protecting the +12 site, but leaving the +34 site exposed: 

      (6) Figure 2 S1: Pol II ChIP in the coding region gives a better correlation with transcription vs ChEC in promoters. Also, Pol II ChIP at terminators is almost as good as ChEC at promoters for estimating transcription. This latter point seems at odds with the text. The authors should comment on this and modify the text as needed. 

      Thank you for this comment.  We have clarified the text.

      In Figures 4 and 5, it's hard to tell how well changes in transcription correlate with changes in Pol II ChEC signals. It might be helpful to have a scatterplot or some other type of plot so that this relationship can be better evaluated. 

      While we find corresponding increase/decrease in ChEC-seq2 signal in genes identified as up/downregulated by SLAM-seq, the magnitude in change is not well correlated between the two techniques.  This was not surprising, because neither ChIP nor ChEC correlate especially well with SLAM-seq (Figure 2 – supplement 1E).

      In Figure 5, it's unclear why Pol association with Rap1 is being measured. Buratowski/Gelles showed that Pol associates with strong acidic activators - presumably through Mediator. Rap1 supposedly does not bind Mediator - so how is Pol associating here? Perhaps it would be better to measure Pol binding at STM genes that show Mediator-UAS binding. 

      Thank you; this is a good point.  We chose Rap1 because we had generated high-confidence binding sites in our strains under these conditions by ChEC-seq2. The results suggest that RNAPII is recruited well to these sites and that this recruitment does not require TFIIB. However, in disagreement with the notion that Mediator does not interact with Rap1, ChEC with Mediator subunits Med1 and Med8 also show peaks at these sites (new Figure 5F; the old Figure 5F is now Figure 5 – Supplement 1).  Therefore, either these sites are co-occupied by other transcription factors that mind Mediator, or Mediator is recruited by Rap1.  In either case, this correlates with binding of RNAPII. 

      Reviewer 2:

      (1) The term "nascent transcription" is all too often used interchangeably for NET-seq, PRO-seq, 4sUseq, and other assays that often provide different types of information. The authors should make it clear their use of the term refers to SLAM-seq data. 

      We have clarified throughout the manuscript that nascent transcription measured by SLAM-seq.

      The authors should explicitly state that experiments were performed in S. cerevisiae in the Results section. 

      We have made it clear in the title and the text that these experiments were performed in S. cerevisiae.

      Lines 216-218 state that "None of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq". I understand the authors' point, but there are parameter combinations that produce a flat profile with slightly less signal over the promoter (e.g., 5 sec dwell times and 3000 bp/ min elongation rate). If flanking windows were included, this profile would look something like ChIP-seq. I'd encourage the authors to be more precise with their language. 

      Thank you for highlighting this over-statement.

      We have now clarified the text and added another supplementary panel as follows:

      “While some combinations predicted a relatively flat distribution across the gene with lower levels in the promoter, none of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq. Only very short promoter dwell times (i.e., < 1s), produced the low promoter occupancy seen in ChIP-seq (Figure 2 – supplement 1F).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      The work introduces a valuable new method for depleting the ribosomal RNA from bacterial single-cell RNA sequencing libraries and shows that this method is applicable to studying the heterogeneity in microbial biofilms. The evidence for a small subpopulation of cells at the bottom of the biofilm which upregulates PdeI expression is solid. However, more investigation into the unresolved functional relationship between PdeI and c-di-GMP levels with the help of other genes co-expressed in the same cluster would have made the conclusions more significant. 

      Many thanks for eLife’s assessment of our manuscript and the constructive feedback. We are encouraged by the recognition of our bacterial single-cell RNA-seq methodology as valuable and its efficacy in studying bacterial population heterogeneity. We appreciate the suggestion for additional investigation into the functional relationship between PdeI and c-di-GMP levels. We concur that such an exploration could substantially enhance the impact of our conclusions. To address this, we have implemented the following revisions: We have expanded our data analysis to identify and characterize genes co-expressed with PdeI within the same cellular cluster (Fig. 3F, G, Response Fig. 10); We conducted additional experiments to validate the functional relationships between PdeI and c-di-GMP, followed by detailed phenotypic analyses (Response Fig. 9B). Our analysis reveals that while other marker genes in this cluster are co-expressed, they do not significantly impact biofilm formation or directly relate to c-di-GMP or PdeI. We believe these revisions have substantially enhanced the comprehensiveness and context of our manuscript, thereby reinforcing the significance of our discoveries related to microbial biofilms. The expanded investigation provides a more thorough understanding of the PdeI-associated subpopulation and its role in biofilm formation, addressing the concerns raised in the initial assessment.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Yan and colleagues introduce a modification to the previously published PETRI-seq bacterial single-cell protocol to include a ribosomal depletion step based on a DNA probe set that selectively hybridizes with ribosome-derived (rRNA) cDNA fragments. They show that their modification of the PETRI-seq protocol increases the fraction of informative non-rRNA reads from ~4-10% to 54-92%. The authors apply their protocol to investigating heterogeneity in a biofilm model of E. coli, and convincingly show how their technology can detect minority subpopulations within a complex community. 

      Strengths: 

      The method the authors propose is a straightforward and inexpensive modification of an established split-pool single-cell RNA-seq protocol that greatly increases its utility, and should be of interest to a wide community working in the field of bacterial single-cell RNA-seq. 

      Weaknesses: 

      The manuscript is written in a very compressed style and many technical details of the evaluations conducted are unclear and processed data has not been made available for evaluation, limiting the ability of the reader to independently judge the merits of the method. 

      Thank you for your thoughtful and constructive review of our manuscript. We appreciate your recognition of the strengths of our work and the potential impact of our modified PETRI-seq protocol on the field of bacterial single-cell RNA-seq. We are grateful for the opportunity to address your concerns and improve the clarity and accessibility of our manuscript.

      We acknowledge your feedback regarding the compressed writing style and lack of technical details, which are constrained by the requirements of the Short Report format in eLife. We have addressed these issues in our revised manuscript as follows:

      (1) Expanded methodology section: We have provided a more comprehensive description of our experimental procedures, including detailed protocols for the ribosomal depletion step (lines 435-453) and data analysis pipeline (lines 471-528). This will enable readers to better understand and potentially replicate our methods.

      (2) Clarification of technical evaluations: We have elaborated on the specifics of our evaluations, including the criteria used for assessing the efficiency of ribosomal depletion (lines 99-120), and the methods employed for identifying and characterizing subpopulations (lines 155-159, 161-163 and 163-167).

      (3) Data availability: We apologize for the oversight in not making our processed data readily available. We have deposited all relevant datasets, including raw and source data, in appropriate public repositories (GEO: GSE260458) and provide clear instructions for accessing this data in the revised manuscript.

      (4) Supplementary information: To maintain the concise nature of the main text while providing necessary details, we have included additional supplementary information. This will cover extended methodology (lines 311-318, 321-323, 327-340, 450-453, 533, and 578-589), detailed statistical analyses (lines 492-493, 499-501 and 509-528), and comprehensive data tables to support our findings.

      We believe these changes significantly improved the clarity and reproducibility of our work, allowing readers to better evaluate the merits of our method.

      Reviewer #2 (Public Review): 

      Summary: 

      This work introduces a new method of depleting the ribosomal reads from the single-cell RNA sequencing library prepared with one of the prokaryotic scRNA-seq techniques, PETRI-seq. The advance is very useful since it allows broader access to the technology by lowering the cost of sequencing. It also allows more transcript recovery with fewer sequencing reads. The authors demonstrate the utility and performance of the method for three different model species and find a subpopulation of cells in the E.coli biofilm that express a protein, PdeI, which causes elevated c-di-GMP levels. These cells were shown to be in a state that promotes persister formation in response to ampicillin treatment. 

      Strengths: 

      The introduced rRNA depletion method is highly efficient, with the depletion for E.coli resulting in over 90% of reads containing mRNA. The method is ready to use with existing PETRI-seq libraries which is a large advantage, given that no other rRNA depletion methods were published for split-pool bacterial scRNA-seq methods. Therefore, the value of the method for the field is high. There is also evidence that a small number of cells at the bottom of a static biofilm express PdeI which is causing the elevated c-di-GMP levels that are associated with persister formation. Given that PdeI is a phosphodiesterase, which is supposed to promote hydrolysis of c-di-GMP, this finding is unexpected. 

      Weaknesses: 

      With the descriptions and writing of the manuscript, it is hard to place the findings about the PdeI into existing context (i.e. it is well known that c-di-GMP is involved in biofilm development and is heterogeneously distributed in several species' biofilms; it is also known that E.coli diesterases regulate this second messenger, i.e. https://journals.asm.org/doi/full/10.1128/jb.00604-15). 

      There is also no explanation for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels. Perhaps the examination of the rest of the genes in cluster 2 of the biofilm sample could be useful to explain the observed association. 

      Thank you for your thoughtful and constructive review of our manuscript. We are pleased that the reviewer recognizes the value and efficiency of our rRNA depletion method for PETRI-seq, as well as its potential impact on the field. We would like to address the points raised by the reviewer and provide additional context and clarification regarding the function of PdeI in c-di-GMP regulation.

      We acknowledge that c-di-GMP’s role in biofilm development and its heterogeneous distribution in bacterial biofilms are well studied. We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI is predicted to function as a phosphodiesterase involved in c-di-GMP degradation, based on sequence analysis demonstrating the presence of an intact EAL domain, which is known for this function. However, it is important to note that PdeI also harbors a divergent GGDEF domain, typically associated with c-di-GMP synthesis. This dual-domain structure indicates that PdeI may play complex regulatory roles. Previous studies have shown that knocking out the major phosphodiesterase PdeH in E. coli results in the accumulation of c-di-GMP. Moreover, introducing a point mutation (G412S) in PdeI's divergent GGDEF domain within this PdeH knockout background led to decreased c-di-GMP levels2. This finding implies that the wild-type GGDEF domain in PdeI contributes to maintaining or increasing cellular c-di-GMP levels.

      Importantly, our single-cell experiments demonstrated a positive correlation between PdeI expression levels and c-di-GMP levels (Figure 4D). In this revision, we also constructed a PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite an increase in BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Figure 4D). This experimental evidence, coupled with domain analyses, suggests that PdeI may also contribute to c-di-GMP synthesis, rebutting the notion that it acts solely as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that the overexpression of PdeI, induced by arabinose, resulted in increased c-di-GMP levels (Fig. 4E) . These findings strongly suggest that PdeI plays a pivotal role in upregulating c-di-GMP levels.

      Our further analysis indicated that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results showing that PdeI is a membrane-associated protein, we hypothesize that PdeI acts as a sensor, integrating environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. Upon careful analysis, we have determined that the other marker genes in this cluster do not significantly impact biofilm formation, nor have we identified any direct relationship between these genes, c-di-GMP, or PdeI. Our focus on PdeI within this cluster is justified by its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While other genes in this cluster may be co-expressed, their functions appear unrelated to the PdeI-c-di-GMP pathway we are investigating. Therefore, we opted not to elaborate on these genes in our main discussion, as they do not contribute directly to our understanding of the PdeI-c-di-GMP association. However, we can include a brief mention of these genes in the manuscript, indicating their lack of relevance to the PdeI-c-di-GMP pathway. This addition will provide a more comprehensive view of the cluster's composition while maintaining our focus on the key findings related to PdeI and c-di-GMP.

      We have also included the aforementioned explanations and supporting experimental data within the manuscript to clarify this important point (lines 193-217). Thank you for highlighting this apparent contradiction, allowing us to provide a more detailed explanation of our findings.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I found the main text of the manuscript well written and easy to understand, though too compressed in parts to fully understand the details of the work presented, some examples are outlined below. The materials and methods appeared to be less carefully compiled and could use some careful proof-reading for spelling (e.g. repeated use of "minuts" for minutes, "datas" for data) and grammar and sentence fragments (e.g. "For exponential period E. coli data." Line 333). In general, the meaning is still clear enough to be understood. I also was unable to find figure captions for the supplementary figures, making these difficult to understand. 

      We appreciate your careful review, which has helped us improve the clarity and quality of our manuscript. We acknowledge that some parts of the main text may have been overly compressed due to Short Report format in eLife. We have thoroughly reviewed the manuscript and expanded on key areas to provide more comprehensive explanations. We have carefully revised the Materials and Methods section to address the following: Corrected all spelling and grammatical error, including "minuts" to "minutes" and "datas" to "data". Corrected grammatical issues and sentence fragments throughout the section. We sincerely apologize for the omission of captions for the supplementary figures. We have now added detailed captions for all supplementary figures to ensure they are easily understandable. We believe these revisions address your concerns and enhance the overall readability and comprehension of our work.

      General comments: 

      (1) To evaluate the performance of RiboD-PETRI, it would be helpful to have more details in general, particularly to do with the development of the sequencing protocol and the statistics shown. Some examples: How many reads were sequenced in each experiment? Of these, how many are mapped to the bacterial genome? How many reads were recovered per cell? Have the authors performed some kind of subsampling analysis to determine if their sequencing has saturated the detection of expressed genes? The authors show e.g. correlations between classic PETRI-seq and RiboD-PETRI for E. coli in Figure 1, but also have similar data for C. crescentus and S. aureus - do these data behave similarly? These are just a few examples, but I'm sure the authors have asked themselves many similar questions while developing this project; more details, hard numbers, and comparisons would be very much appreciated. 

      Thank you for your valuable feedback. To address your concerns, we have added a table in the supplementary material that clarifies the details of sequencing.

      The correlation values of PETRI-seq and RiboD-PETRI data in C. crescentus are relatively good. However, the correlation values between PETRI-seq and RiboD-PETRI data in SA data are relatively less high. The reason is that the sequencing depths of RiboD-PETRI and PETRI-seq are different, resulting in much higher gene expression in the RiboD-PETRI sequencing results than in PETRI-seq, and the calculated correlation coefficient is only about 0.47. This indicates that there is some positive correlation between the two sets of data, but it is not particularly strong. This indicates that there is a certain positive correlation between these two sets of data, but it is not particularly strong. However, we have counted the expression of 2763 genes in total, and even though the calculated correlation coefficient is relatively low, it still shows that there is some consistency between the two groups of samples.

      Author response image 1.

      Assessment of the effect of rRNA depletion on transcriptional profiles of (A) C. crescentus (CC) and (B) S. aureus (SA) . The Pearson correlation coefficient (r) of UMI counts per gene (log2 UMIs) between RiboD-PETRI and PETRI-seq was calculated for 4097 genes (A) and 2763 genes (B). The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. Each point represents a gene.

      (2) Additionally, I think it is critical that the authors provide processed read counts per cell and gene in their supplementary information to allow others to investigate the performance of their method without going back to raw FASTQ files, as this can represent a significant hurdle for reanalysis. 

      Thank you for your suggestion. However, it's important to clarify that reads and UMIs (Unique Molecular Identifiers) are distinct concepts in single-cell RNA sequencing. Reads can be influenced by PCR amplification during library construction, making their quantity less stable. In contrast, UMIs serve as a more reliable indicator of the number of mRNA molecules detected after PCR amplification. Throughout our study, we primarily utilized UMI counts for quantification. To address your concern about data accessibility, we have included the UMI counts per cell and gene in our supplementary materials provided above (Table S7-15. Some of the files are too large in memory and are therefore stored in GEO: GSE260458). This approach provides a more accurate representation of gene expression levels and allows for robust reanalysis without the need to process raw FASTQ files.

      (3) Finally, the authors should also discuss other approaches to ribosomal depletion in bacterial scRNA-seq. One of the figures appears to contain such a comparison, but it is never mentioned in the text that I can find, and one could read this manuscript and come away believing this is the first attempt to deplete rRNA from bacterial scRNA-seq. 

      We have addressed this concern by including a comparison of different methods for depleting rRNA from bacterial scRNA-seq in Table S4 and make a short text comparison as follows: “Additionally, we compared our findings with other reported methods (Fig. 1B; Table S4). The original PETRI-seq protocol, which does not include an rRNA depletion step, exhibited an mRNA detection rate of approximately 5%. The MicroSPLiT-seq method, which utilizes Poly A Polymerase for mRNA enrichment, achieved a detection rate of 7%. Similarly, M3-seq and BacDrop-seq, which employ RNase H to digest rRNA post-DNA probe hybridization in cells, reported mRNA detection rates of 65% and 61%, respectively. MATQ-DASH, which utilizes Cas9-mediated targeted rRNA depletion, yielded a detection rate of 30%. Among these, RiboD-PETRI demonstrated superior performance in mRNA detection while requiring the least sequencing depth.” We have added this content in the main text (lines 110-120), specifically in relation to Figure 1B and Table S4. This addition provides context for our method and clarifies its position among existing techniques.

      Detailed comments: 

      Line 78: the authors describe the multiplet frequency, but it is not clear to me how this was determined, for which experiments, or where in the SI I should look to see this. Often this is done by mixing cultures of two distinct bacteria, but I see no evidence of this key experiment in the manuscript. 

      The multiplet frequency we discuss in the manuscript is not determined through experimental mixing of distinct bacterial cultures.The PETRI-seq and mirco-SPLIT articles have also done experiments mixing the two libraries to determine the single-cell rate, and both gave good results. Our technique is derived from these two articles (mainly PETRI-seq), and the biggest difference is the difference in the later RiboD part, so we did not do this experiment separately. So the multiple frequencies here are theoretical predictions based on our sequencing results, calculated using a Poisson distribution. We have made this distinction clearer in our manuscript (lines 93-97). The method is available in Materials and Methods section (lines 520-528). The data is available in Table S2. To elaborate:

      To assess the efficiency of single-cell capture in RiboD-PETRI, we calculated the multiplet frequency using a Poisson distribution based on our sequencing results

      (1) Definition: In our study, multiplet frequency is defined as the probability of a non-empty barcode corresponding to more than one cell.

      (2) Calculation Method: We use a Poisson distribution-based approach to calculate the predicted multiplet frequency. The process involves several steps:

      We first calculate the proportion of barcodes corresponding to zero cells: . Then, we calculate the proportion corresponding to one cell: . We derive the proportion for more than zero cells: P(≥1) = 1 - P(0). And for more than one cell: P(≥2) = 1 - P(1) - P(0). Finally, the multiplet frequency is calculated as:

      (3) Parameter λ: This is the ratio of the number of cells to the total number of possible barcode combinations. For instance, when detecting 10,000 cells, .

      Line 94: the concept of "percentage of gene expression" is never clearly defined. Does this mean the authors detect 99.86% of genes expressed in some cells? How is "expressed" defined - is this just detecting a single UMI? 

      The term "percentage gene expression" refers to the proportion of genes in the bacterial strain that were detected as expressed in the sequenced cell population. Specifically, in this context, it means that 99.86% of all genes in the bacterial strain were detected as expressed in at least one cell in our sequencing results. To define "expressed" more clearly: a gene is considered expressed if at least one UMI (Unique Molecular Identifier) detected in a cell in the population. This definition allows for the detection of even low-level gene expression. To enhance clarity in the manuscript, we have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      Line 98: The authors discuss the number of recovered UMIs throughout this paragraph, but there is no clear discussion of the number of detected expressed genes per cell. Could the authors include a discussion of this as well, as this is another important measure of sensitivity? 

      We appreciate your suggestion to include a discussion on the number of detected expressed genes per cell, as this is indeed another important measure of sensitivity. We would like to clarify that we have actually included statistics on the number of genes detected across all cells in the main text of our paper. This information is presented as percentages. However, we understand that you may be looking for a more detailed representation, similar to the UMI statistics we provided. To address this, we have now added a new analysis showing the number of genes detected per cell (lines 132-133, 138-139, 144-145 and 184-186, Fig. 2B, 3B and S2B). This additional result complements our existing UMI data and provides a more comprehensive view of the sensitivity of our method. We have included this new gene-per-cell statistical graph in the supplementary materials.

      Figure 1B: I presume ctrl and delta delta represent the classic PETRI-seq and RiboD protocols, respectively, but this is not specified. This should be clarified in the figure caption, or the names changed. 

      We appreciate you bringing this to our attention. We acknowledge that the labeling in the figure could have been clearer. We have now clarified this information in the figure caption. To provide more specificity: The "ΔΔ" label represents the RiboD-PETRI protocol; The "Ctrl" label represents the classic PETRI-seq protocol we performed. We have updated the figure caption to include these details, which should help readers better understand the protocols being compared in the figure.​

      Line 104: the authors claim "This performance surpassed other reported bacterial scRNA-seq methods" with a long number of references to other methods. "Performance" is not clearly defined, and it is unclear what the exact claim being made is. The authors should clarify what they're claiming, and further discuss the other methods and comparisons they have made with them in a thorough and fair fashion. 

      We appreciate your request for clarification, and we acknowledge that our definition of "performance" should have been more explicit. We would like to clarify that in this context, we define performance primarily in terms of the proportion of mRNA captured. Our improved method demonstrates a significantly higher rate of rRNA removal compared to other bacterial single-cell library construction methods. This results in a higher proportion of mRNA in our sequencing data, which we consider a key performance metric for single-cell RNA sequencing in bacteria. Additionally, when compared to our previous method, PETRI-seq, our improved approach not only enhances rRNA removal but also reduces library construction costs. This dual improvement in both data quality and cost-effectiveness is what we intended to convey with our performance claim.

      We recognize that a more thorough and fair discussion of other methods and their comparisons would be beneficial. We have summarized the comparison in Table S4 and make a short text discussion in the main text (lines 106-120). This addition provides context for our method and clarifies its position among existing techniques.

      Figure 1D: Do the authors have any explanation for the relatively lower performance of their C. crescentus depletion? 

      We appreciate your attention to detail and the opportunity to address this point. The lower efficiency of rRNA removal in C. crescentus compared to other species can be attributed to inherent differences between species. It's important to note that a single method for rRNA depletion may not be universally effective across all bacterial species due to variations in their genetic makeup and rRNA structures. Different bacterial species can have unique rRNA sequences, secondary structures, or associated proteins that may affect the efficiency of our depletion method. This species-specific variation highlights the challenges in developing a one-size-fits-all approach for bacterial rRNA depletion. While our method has shown high efficiency across several species, the results with C. crescentus underscore the need for continued refinement and possibly species-specific optimizations in rRNA depletion techniques. We thank you for bringing attention to this point, as it provides valuable insight into the complexities of bacterial rRNA depletion and areas for future improvement in our method.

      Line 118: The authors claim RiboD-PETRI has a "consistent ability to unveil within-population heterogeneity", however the preceding paragraph shows it detects potential heterogeneity, but provides no evidence this inferred heterogeneity reflects the reality of gene expression in individual cells. 

      We appreciate your careful reading and the opportunity to clarify this point. We acknowledge that our wording may have been too assertive given the evidence presented. We acknowledge that the subpopulations of cells identified in other species have not undergone experimental verification. Our intention in presenting these results was to demonstrate RiboD-PETRI's capability to detect “potential” heterogeneity consistently across different bacterial species, showcasing the method's sensitivity and potential utility in exploring within-population diversity. However, we agree that without further experimental validation, we cannot definitively claim that these detected differences represent true biological heterogeneity in all cases. We have revised this section to reflect the current state of our findings more accurately, emphasizing that while RiboD-PETRI consistently detects potential heterogeneity across species, further experimental validation would be required to confirm the biological significance of the observations (lines 169-171).

      Figure 1 H&I: I'm not entirely sure what I am meant to see in these figures, presumably some evidence for heterogeneity in gene expression. Are there better visualizations that could be used to communicate this? 

      We appreciate your suggestion for improving the visualization of gene expression heterogeneity. We have explored alternative visualization methods in the revised manuscript. Specifically, for the expression levels of marker genes shown in Figure 1H (which is Figure 2D now), we have created violin plots (Supplementary Fig. 4). These plots offer a more comprehensive view of the distribution of expression levels across different cell populations, making it easier to discern heterogeneity. However, due to the number of marker genes and the resulting volume of data, these violin plots are quite extensive and would occupy a significant amount of space. Given the space constraints of the main figure, we propose to include these violin plots as a Fig. S4 immediately following Figure 1 H&I (which is Figure 2D&E now). This arrangement will allow readers to access more detailed information about these marker genes while maintaining the concise style of the main figure.

      Regarding the pathway enrichment figure (Figure 2E), we have also considered your suggestion for improvement. We attempted to use a dot plot to display the KEGG pathway enrichment of the genes. However, our analysis revealed that the genes were only enriched in a single pathway. As a result, the visual representation using a dot plot still did not produce a particularly aesthetically pleasing or informative figure.

      Line 124: The authors state no significant batch effect was observed, but in the methods on line 344 they specify batch effects were removed using Harmony. It's unclear what exactly S2 is showing without a figure caption, but the authors should clarify this discrepancy. 

      We apologize for any confusion caused by the lack of a clear figure caption for Figure S2 (which is Figure S3D now). To address your concern, in addition to adding figure captions for supplementary figure, we would also like to provide more context about the batch effect analysis. In Supplementary Fig. S3, Panel C represents the results without using Harmony for batch effect removal, while Panel D shows the results after applying Harmony. In both panels A and B, the distribution of samples one and two do not show substantial differences. Based on this observation, we concluded that there was no significant batch effect between the two samples. However, we acknowledge that even subtle batch effects could potentially influence downstream analyses. Therefore, out of an abundance of caution and to ensure the highest quality of our results, we decided to apply Harmony to remove any potential minor batch effects. This approach aligns with best practices in single-cell analysis, where even small technical variations are often accounted for to enhance the robustness of the results.

      To improve clarity, we have revised our manuscript to better explain this nuanced approach: 1. We have updated the statement to reflect that while no major batch effect was observed, we applied batch correction as a precautionary measure (lines 181-182). 2. We have added a detailed caption to Figure S3, explaining the comparison between non-corrected and batch-corrected data. 3. We have modified the methods section to clarify that Harmony was applied as a precautionary step, despite the absence of obvious batch effects (lines 492-493).

      Figure 2D: I found this panel fairly uninformative, is there a better way to communicate this finding? 

      Thank you for your feedback regarding Figure 2D. We have explored alternative ways to present this information, using a dot plot to display the enrichment pathways, as this is often an effective method for visualizing such data. Meanwhile, we also provided a more detailed textual description of the enrichment results in the main text, highlighting the most significant findings.

      Figure 2I: the figure itself and caption say GFP, but in the text and elsewhere the authors say this is a BFP fusion. 

      We appreciate your careful review of our manuscript and figures. We apologize for any confusion this may have caused. To clarify: Both GFP (Green Fluorescent Protein) and BFP (Blue Fluorescent Protein) were indeed used in our experiments, but for different purposes: 1. GFP was used for imaging to observe location of PdeI in bacteria and persister cell growth, which is shown in Figure 4C and 4K. 2. BFP was used for cell sorting, imaging of location in biofilm, and detecting the proportion of persister cells which shown in Figure 4D, 4F-J. To address this inconsistency and improve clarity, we will make the following corrections: 1. We have reviewed the main text to ensure that references to GFP and BFP are accurate and consistent with their respective uses in our experiments. 2. We have added a note in the figure caption for Figure 4C to explicitly state that this particular image shows GFP fluorescence for location of PdeI. 3. In the methods section, we have provided a clear explanation of how both fluorescent proteins were used in different aspects of our study (lines 326-340).

      Line 156: The authors compare prices between RiboD and PETRI-seq. It would be helpful to provide a full cost breakdown, e.g. in supplementary information, as it is unclear exactly how the authors came to these numbers or where the major savings are (presumably in sequencing depth?) 

      We appreciate your suggestion to provide a more detailed cost breakdown, and we agree that this would enhance the transparency and reproducibility of our cost analysis. In response to your feedback, we have prepared a comprehensive cost breakdown that includes all materials and reagents used in the library preparation process. Additionally, we've factored in the sequencing depth (50G) and the unit price for sequencing (25¥/G). These calculations allow us to determine the cost per cell after sequencing. As you correctly surmised, a significant portion of the cost reduction is indeed related to sequencing depth. However, there are also savings in the library preparation steps that contribute to the overall cost-effectiveness of our method. We propose to include this detailed cost breakdown as a supplementary table (Table S6) in our paper. This table will provide a clear, itemized list of all expenses involved, including: 1. Reagents and materials for library preparation 2. Sequencing costs (depth and price per G) 3. Calculated cost per cell.

      Line 291: The design and production of the depletion probes are not clearly explained. How did the authors design them? How were they synthesized? Also, it appears the authors have separate probe sets for E. coli, C. crescentus, and S. aureus - this should be clarified, possibly in the main text.

      Thank you for your important questions regarding the design and production of our depletion probes. We included the detailed probe information in Supplementary Table S1, however, we didn’t clarify the information in the main text due to the constrains of the requirements of the Short Report format in eLife. We appreciate the opportunity to provide clarifications. ​

      The core principle behind our probe design is that the probe sequences are reverse complementary to the r-cDNA sequences. This design allows for specific recognition of r-cDNA. The probes are then bound to magnetic beads, allowing the r-cDNA-probe-bead complexes to be separated from the rest of the library. To address your specific questions: 1. Probe Design: We designed separate probe sets for E. coli, C. crescentus, and S. aureus. Each set was specifically constructed to be reverse complementary to the r-cDNA sequences of its respective bacterial species. This species-specific approach ensures high efficiency and specificity in rRNA depletion for each organism. The hybrid DNA complex wasthen removed by Streptavidin magnetic beads. 2. Probe Synthesis: The probes were synthesized based on these design principles. 3. Species-Specific Probe Sets: You are correct in noting that we used separate probe sets for each bacterial species. We have clarified this important point in the main text to ensure readers understand the specificity of our approach. To further illustrate this process, we have created a schematic diagram showing the principle of rRNA removal and clarified the design principle in figure legend, which we have included in the figure legend of Fig. 1A.

      Line 362: I didn't see a description of the construction of the PdeI-BFP strain, I assume this would be important for anyone interested in the specific work on PdeI. 

      Thank you for your astute observation regarding the construction of the PdeI-BFP strain. We appreciate the opportunity to provide this important information. The PdeI-BFP strain was constructed as follows: 1. We cloned the pdeI gene along with its native promoter region (250bp) into a pBAD vector. 2. The original promoter region of the pBAD vector was removed to avoid any potential interference. 3. This construction enables the expression of the PdeI-BFP fusion protein to be regulated by the native promoter of pdeI, thus maintaining its physiological control mechanisms. 4. The BFP coding sequence was fused to the pdeI gene to create the PdeI-BFP fusion construct. We have added a detailed description of the PdeI-BFP strain construction to our methods section (lines 327-334).

      Reviewer #2 (Recommendations For The Authors): 

      (1) General remarks: 

      Reconsider using 'advanced' in the title. It is highly generic and misleading. Perhaps 'cost-efficient' would be a more precise substitute. 

      Thank you for your valuable suggestion. After careful consideration, we have decided to use "improved" in the title. Firstly, our method presents an efficient solution to a persistent challenge in bacterial single-cell RNA sequencing, specifically addressing rRNA abundance. Secondly, it facilitates precise exploration of bacterial population heterogeneity. We believe our method encompasses more than just cost-effectiveness, justifying the use of the term "advanced."

      Consider expanding the introduction. The introduction does not explain the setup of the biological question or basic details such as the organism(s) for which the technique has been developed, or which species biofilms were studied. 

      Thank you for your valuable feedback regarding our introduction. We acknowledge our compressed writing style due to constrains of the requirements of the Short Report format in eLife. We appreciate opportunity to expand this crucial section of our manuscript, which will undoubtedly improve the clarity and impact of our manuscript's introduction.

      We revised our introduction (lines 53-80) according to following principles:

      (1) Initial Biological Question: We explained the initial biological question that motivated our research—understanding the heterogeneity in E. coli biofilms—to provide essential context for our technological development.

      (2) Limitations of Existing Techniques: We briefly described the limitations of current single-cell sequencing techniques for bacteria, particularly regarding their application in biofilm studies.

      (3) Introduction of Improved Technique: We introduced our improved technique, initially developed for E. coli.

      (4) Research Evolution: We highlighted how our research has evolved, demonstrating that our technique is applicable not only to E. coli but also to Gram-positive bacteria and other Gram-negative species, showcasing the broad applicability of our method.

      (5) Specific Organisms Studied: We provided examples of the specific organisms we studied, encompassing both Gram-positive and Gram-negative bacteria.

      (6) Potential Implications: Finally, we outlined the potential implications of our technique for studying bacterial heterogeneity across various species and contexts, extending beyond biofilms.

      (2) Writing remarks: 

      43-45 Reword: "Thus, we address a persistent challenge in bacterial single-cell RNA-seq regarding rRNA abundance, exemplifying the utility of this method in exploring biofilm heterogeneity.". 

      Thank you for highlighting this sentence and requesting a rewording. I appreciate the opportunity to improve the clarity and impact of our statement. We have reworded the sentence as: "Our method effectively tackles a long-standing issue in bacterial single-cell RNA-seq: the overwhelming abundance of rRNA. This advancement significantly enhances our ability to investigate the intricate heterogeneity within biofilms at unprecedented resolution." (lines 47-50)

      49 "Biofilms, comprising approximately 80% of chronic and recurrent microbial infections in the human body..." - probably meant 'contribute to'. 

      Thank you for catching this imprecision in our statement. We have reworded the sentence as: "​Biofilms contribute to approximately 80% of chronic and recurrent microbial infections in the human body...​"

      54-55 Please expand on "this". 

      Thank you for your request to expand on the use of "this" in the sentence. You're right that more clarity would be beneficial here. We have revised and expanded this section in lines 54-69.

      81-84 Unclear why these species samples were either at exponential or stationary phases. The growth stage can influence the proportion of rRNA and other transcripts in the population. 

      Thank you for raising this important point about the growth phases of the bacterial samples used in our study. We appreciate the opportunity to clarify our experimental design. To evaluate the performance of RiboD-PETRI, we designed a comprehensive assessment of rRNA depletion efficiency under diverse physiological conditions, specifically contrasting exponential and stationary phases. This approach allows us to understand how these different growth states impact rRNA depletion efficacy. Additionally, we included a variety of bacterial species, encompassing both gram-negative and gram-positive organisms, to ensure that our findings are broadly applicable across different types of bacteria. By incorporating these variables, we aim to provide insights into the robustness and reliability of the RiboD-PETRI method in various biological contexts. We have included this rationale in our result section (lines 99-106), providing readers with a clear understanding of our experimental design choices.

      86 "compared TO PETRI-seq " (typo). 

      We have corrected this typo in our manuscript.

      94 "gene expression collectively" rephrase. Probably this means coverage of the entire gene set across all cells. Same for downstream usage of the phrase. 

      Thank you for pointing out this ambiguity in our phrasing. Your interpretation of our intended meaning is accurate. We have rephrased the sentence as “transcriptome-wide gene coverage across the cell population”.

      97 What were the median UMIs for the 30,000 cell library {greater than or equal to}15 UMIs? Same question for the other datasets. This would reflect a more comparable statistic with previous studies than the top 3% of the cells for example, since the distributions of the single-cell UMIs typically have a long tail. 

      Thank you for this insightful question and for pointing out the importance of providing more comparable statistics. We agree that median values offer a more robust measure of central tendency, especially for datasets with long-tailed distributions, which are common in single-cell studies. The suggestion to include median Unique Molecular Identifier (UMI) counts would indeed provide a more comparable statistic with previous studies. We have analyzed the median UMIs for our libraries as follows and revised our manuscript according to the analysis (lines 126-130, 133-136, 139-142 and 175-180).

      (1) Median UMI count in Exponential Phase E. coli:

      Total: 102 UMIs per cell

      Top 1,000 cells: 462 UMIs per cell

      Top 5,000 cells: 259 UMIs per cell

      Top 10,000 cells: 193 UMIs per cell

      (2) Median UMI count in Stationary Phase S. aureus:

      Total: 142 UMIs per cell

      Top 1,000 cells: 378 UMIs per cell

      Top 5,000 cells: 207 UMIs per cell

      Top 8,000 cells: 167 UMIs per cell

      (3) Median UMI count in Exponential Phase C. crescentus:

      Total: 182 UMIs per cell

      Top 1,000 cells: 2,190 UMIs per cell

      Top 5,000 cells: 662 UMIs per cell

      Top 10,000 cells: 225 UMIs per cell

      (4) Median UMI count in Static E. coli Biofilm:

      Total of Replicate 1: 34 UMIs per cell

      Total of Replicate 2: 52 UMIs per cell

      Top 1,621 cells of Replicate 1: 283 UMIs per cell

      Top 3,999 cells of Replicate 2: 239 UMIs per cell

      104-105 The performance metric should again be the median UMIs of the majority of the cells passing the filter (15 mRNA UMIs is reasonable). The top 3-5% are always much higher in resolution because of the heavy tail of the single-cell UMI distribution. It is unclear if the performance surpasses the other methods using the comparable metric. Recommend removing this line. 

      We appreciate your suggestion regarding the use of median UMIs as a more appropriate performance metric, and we agree that comparing the top 3-5% of cells can be misleading due to the heavy tail of the single-cell UMI distribution. We have removed the line in question (104-105) that compares our method's performance based on the top 3-5% of cells in the revised manuscript. Instead, we focused on presenting the median UMI counts for cells passing the filter (≥15 mRNA UMIs) as the primary performance metric. This will provide a more representative and comparable measure of our method's performance. We have also revised the surrounding text to reflect this change, ensuring that our claims about performance are based on these more robust statistics (lines 126-130, 133-136, 139-142 and 175-180).

      106-108 The sequencing saturation of the libraries (in %), and downsampling analysis should be added to illustrate this point. 

      Thank you for your valuable suggestion. Your recommendation to add sequencing saturation and downsampling analysis is highly valuable and will help better illustrate our point. Based on your feedback, we have revised our manuscript by adding the following content:

      To provide a thorough evaluation of our sequencing depth and library quality, we performed sequencing saturation analysis on our sequencing samples. The findings reveal that our sequencing saturation is 100% (Fig. 8A & B), indicating that our sequencing depth is sufficient to capture the diversity of most transcripts. To further illustrate the impact of our downstream analysis on the datasets, we have demonstrated the data distribution before and after applying our filtering criteria (Fig. S1B & C). These figures effectively visualized the influence of our filtering process on the data quality and distribution. After filtering, we can have a more refined dataset with reduced noise and outliers, which enhances the reliability of our downstream analyses.

      We have also ensured that a detailed description of the sequencing saturation method is included in the manuscript to provide readers with a comprehensive understanding of our methodology. We appreciate your feedback and believe these additions significantly improve our work.

      122: Please provide more details about the biofilm setup, including the media used. I did not find them in the methods. 

      We appreciate your attention to detail, and we agree that this information is crucial for the reproducibility of our experiments. We propose to add the following information to our methods section (lines 311-318):

      "For the biofilm setup, bacterial cultures were grown overnight. The next day, we diluted the culture 1:100 in a petri dish. We added 2ml of LB medium to the dish. If the bacteria contain a plasmid, the appropriate antibiotic needs to be added to LB. The petri dish was then incubated statically in a growth chamber for 24 hours. After incubation, we performed imaging directly under the microscope. The petri dishes used were glass-bottom dishes from Biosharp (catalog number BS-20-GJM), allowing for direct microscopic imaging without the need for cover slips or slides. This setup allowed us to grow and image the biofilms in situ, providing a more accurate representation of their natural structure and composition.​"

      125: "sequenced 1,563 reads" missing "with" 

      Thank you for correcting our grammar. We have revisd the phrase as “sequenced with 1,563 reads”.

      126: "283/239 UMIs per cell" unclear. 283 and 239 UMIs per cell per replicate, respectively? 

      Thank you for correcting our grammar. We have revised the phrase as “283 and 239 UMIs per cell per replicate, respectively” (lines 184).

      Figure 1D: Please indicate where the comparison datasets are from. 

      We appreciate your question regarding the source of the comparison datasets in Figure 1D. All data presented in Figure 1D are from our own sequencing experiments. We did not use data from other publications for this comparison. Specifically, we performed sequencing on E. coli cells in the exponential growth phase using three different library preparation methods: RiboD-PETRI, PETRI-seq, and RNA-seq. The data shown in Figure 1D represent a comparison of UMIs and/or reads correlations obtained from these three methods. All sequencing results have been uploaded to the Gene Expression Omnibus (GEO) database. The accession number is GSE260458. We have updated the figure legend for Figure 1D to clearly state that all datasets are from our own experiments, specifying the different methods used.

      Figure 1I, 2D: Unable to interpret the color block in the data. 

      We apologize for any confusion regarding the interpretation of the color blocks in Figures 1I and 2D (which are Figure 2E, 3E now). The color blocks in these figures represent the p-values of the data points. The color scale ranges from red to blue. Red colors indicate smaller p-values, suggesting higher statistical significance and more reliable results. Blue colors indicate larger p-values, suggesting lower statistical significance and less reliable results. We have updated the figure legends for both Figure 2E and Figure 3E to include this explanation of the color scale. Additionally, we have added a color legend to each figure to make the interpretation more intuitive for readers.

      Figure1H and 2C: Gene names should be provided where possible. The locus tags are highly annotation-dependent and hard to interpret. Also, a larger size figure should be helpful. The clusters 2 and 3 in 2C are the most important, yet because they have few cells, very hard to see in this panel. 

      We appreciate your suggestions for improving the clarity and interpretability of Figures 1H and 2C (which is Figure 2D, 3D now). We have replaced the locus tags with gene names where possible in both figures. We have increased the size of both figures to improve visibility and readability. We have also made Clusters 2 and 3 in Figure 3D more prominent in the revised figure. Despite their smaller cell count, we recognize their importance and have adjusted the visualization to ensure they are clearly visible. We believe these modifications will significantly enhance the clarity and informativeness of Figures 2D and 3D.​

      (3) Questions to consider further expanding on, by more analyses or experiments and in the discussion: 

      What are the explanations for the apparently contradictory upregulation of c-di-GMP in cells expressing higher PdeI levels? How could a phosphodiesterase lead to increased c-di-GMP levels? 

      We appreciate the reviewer's observation regarding the seemingly contradictory relationship between increased PdeI expression and elevated c-di-GMP levels. This is indeed an intriguing finding that warrants further explanation.

      PdeI was predicted to be a phosphodiesterase responsible for c-di-GMP degradation. This prediction is based on sequence analysis where PdeI contains an intact EAL domain known for degrading c-di-GMP. However, it is noteworthy that PdeI also contains a divergent GGDEF domain, which is typically associated with c-di-GMP synthesis (Fig S8). This dual-domain architecture suggests that PdeI may engage in complex regulatory roles. Previous studies have shown that the knockout of the major phosphodiesterase PdeH in E. coli leads to the accumulation of c-di-GMP. Further, a point mutation on PdeI's divergent GGDEF domain (G412S) in this PdeH knockout strain resulted in decreased c-di-GMP levels2, implying that the wild-type GGDEF domain in PdeI contributes to the maintenance or increase of c-di-GMP levels in the cell. Importantly, our single-cell experiments showed a positive correlation between PdeI expression levels and c-di-GMP levels (Response Fig. 9B). In this revision, we also constructed PdeI(G412S)-BFP mutation strain. Notably, our observations of this strain revealed that c-di-GMP levels remained constant despite increasing BFP fluorescence, which serves as a proxy for PdeI(G412S) expression levels (Fig. 4D). This experimental evidence, along with domain analysis, suggests that PdeI could contribute to c-di-GMP synthesis, rebutting the notion that it solely functions as a phosphodiesterase. HPLC LC-MS/MS analysis further confirmed that PdeI overexpression, induced by arabinose, led to an upregulation of c-di-GMP levels (Fig. 4E). These results strongly suggest that PdeI plays a significant role in upregulating c-di-GMP levels. Our further analysis revealed that PdeI contains a CHASE (cyclases/histidine kinase-associated sensory) domain. Combined with our experimental results demonstrating that PdeI is a membrane-associated protein, we hypothesize that PdeI functions as a sensor that integrates environmental signals with c-di-GMP production under complex regulatory mechanisms.

      We have also included this explanation (lines 193-217) and the supporting experimental data (Fig. 4D & 4J) in our manuscript to clarify this important point. Thank you for highlighting this apparent contradiction, as it has allowed us to provide a more comprehensive explanation of our findings.

      What about the rest of the genes in cluster 2 of the biofilm? They should be used to help interpret the association between PdeI and c-di-GMP. 

      We understand your interest in the other genes present in cluster 2 of the biofilm and their potential relationship to PdeI and c-di-GMP. After careful analysis, we have determined that the other marker genes in this cluster do not have a significant impact on biofilm formation. Furthermore, we have not found any direct relationship between these genes and c-di-GMP or PdeI. Our focus on PdeI in this cluster is due to its unique and significant role in c-di-GMP regulation and biofilm formation, as demonstrated by our experimental results. While the other genes in this cluster may be co-expressed, their functions appear to be unrelated to the PdeI and c-di-GMP pathway we are investigating. We chose not to elaborate on these genes in our main discussion as they do not contribute directly to our understanding of the PdeI and c-di-GMP association. Instead, we could include a brief mention of these genes in the manuscript, noting that they were found to be unrelated to the PdeI-c-di-GMP pathway. This would provide a more comprehensive view of the cluster composition while maintaining focus on the key findings related to PdeI and c-di-GMP.

      Author response image 2.

      Protein-protein interactions of marker genes in cluster 2 of 24-hour static biofilms of E coli data.

      A verification is needed that the protein fusion to PdeI functional/membrane localization is not due to protein interactions with fluorescent protein fusion. 

      We appreciate your concern regarding the potential impact of the fluorescent protein fusion on the functionality and membrane localization of PdeI. It is crucial to verify that the observed effects are attributable to PdeI itself and not an artifact of its fusion with the fluorescent protein. To address this matter, we have incorporated a control group expressing only the fluorescent protein BFP (without the PdeI fusion) under the same promoter. This experimental design allows us to differentiate between effects caused by PdeI and those potentially arising from the fluorescent protein alone.

      Our results revealed the following key observations:

      (1) Cellular Localization: The GFP alone exhibited a uniform distribution in the cytoplasm of bacterial cells, whereas the PdeI-GFP fusion protein was specifically localized to the membrane (Fig. 4C).

      (2) Localization in the Biofilm Matrix: BFP-positive cells were distributed throughout the entire biofilm community. In contrast, PdeI-BFP positive cells localized at the bottom of the biofilm, where cell-surface adhesion occurs (Fig 4F).

      (3) c-di-GMP Levels: Cells with high levels of BFP displayed no increase in c-di-GMP levels. Conversely, cells with high levels of PdeI-BFP exhibited a significant increase in c-di-GMP levels (Fig. 4D).

      (4) Persister Cell Ratio: Cells expressing high levels of BFP showed no increase in persister ratios, while cells with elevated levels of PdeI-BFP demonstrated a marked increase in persister ratios (Fig. 4J).

      These findings from the control experiments have been included in our manuscript (lines 193-244, Fig. 4C, 4D, 4F, 4G and 4J), providing robust validation of our results concerning the PdeI fusion protein. They confirm that the observed effects are indeed due to PdeI and not merely artifacts of the fluorescent protein fusion.

      (!) Vrabioiu, A. M. & Berg, H. C. Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proceedings of the National Academy of Sciences of the United States of America 119, doi:10.1073/pnas.2116830119 (2022). https://doi.org/10.1073/pnas.2116830119

      (2)bReinders, A. et al. Expression and Genetic Activation of Cyclic Di-GMP-Specific Phosphodiesterases in Escherichia coli. J Bacteriol 198, 448-462 (2016). https://doi.org:10.1128/JB.00604-15

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The main goal of the paper was to identify signals that activate FLP-1 release from AIY neurons in response to H2O2, previously shown by the authors to be an important oxidative stress response in the worm. 

      Strengths: 

      This study builds upon the authors' previous work (Jia and Sieburth 2021) by further elucidating the gut-derived signaling mechanisms that coordinate the organism-wide antioxidant stress response in C. elegans. 

      By detailing how environmental cues like oxidative stress are transduced into gut-derived peptidergic signals, this study represents a valuable advancement in understanding the integrated physiological responses governed by the gut-brain axis. 

      This work provides valuable mechanistic insights into the gut-specific regulation of the FLP2 peptide signal. 

      Weaknesses: 

      Although the authors identify intestinal FLP-2 as the endocrine signal important for regulating the secretion of the neuronal antioxidant neuropeptide, FLP-1, there is no effort made to identify how FLP-2 levels regulate FLP-1 secretion or identify whether this regulation is occurring directly through the AIY neuron or indirectly. This is brought up in the discussion, but identifying a target for FLP-2 in this pathway seems like a crucial missing piece of information in characterizing this pathway. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study. We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion.

      Reviewer #2 (Public Review): 

      Summary: 

      The core findings demonstrate that the neuropeptide-like protein FLP-2, released from the intestine of C. elegans, is essential for activating the intestinal oxidative stress response. This process is mediated by endogenous hydrogen peroxide (H2O2), which is produced in the mitochondrial matrix by superoxide dismutases SOD-1 and SOD-3. H2O2 facilitates FLP-2 secretion through the activation of protein kinase C family member pkc-2 and the SNAP25 family member aex-4. The study further elucidates that FLP-2 signaling potentiates the release of the antioxidant FLP-1 neuropeptide from neurons, highlighting a bidirectional signaling mechanism between the intestine and the nervous system. 

      Strengths: 

      This study presents a significant contribution to the understanding of the gut-brain axis and its role in oxidative stress response and significantly advances our understanding of the intricate mechanisms underlying the gut-brain axis's role in oxidative stress response. By elucidating the role of FLP-2 and its regulation by H2O2, the study provides insights into the molecular basis of inter-tissue communication and antioxidant defense in C. elegans. These findings could have broader implications for understanding similar pathways in more complex organisms, potentially offering new targets for therapeutic intervention in diseases related to oxidative stress and aging. 

      Weaknesses: 

      (1) The experimental techniques employed in the study were somewhat simple and could benefit from the incorporation of more advanced methodologies. 

      Thank you for your comment

      (2) The weak identification of the key receptors mediating the interaction between FLP-2 and AIY neurons, as well as the receptors in the gut that respond to FLP-1. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study.

      (3) The study could be improved by incorporating a sensor for the direct measurement of hydrogen peroxide levels. 

      We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY using the genetically encoded peroxide sensor HyPer7. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion. In addition, we have used HyPer7 to measure peroxide levels in the intestinal mitochondrial matrix and outer membrane (Figs 3, 4, 5, 6)

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The major missing link in the study is how FLP-2 affects FLP-1 release from AIY: is the effect direct and does it require the previously described FLP-2 receptor FRPR-18? Although this possibility is discussed extensively (L511-528) so it is odd that the effect of an frpr-18 mutation was not tested (or if it was tested, why the results were not reported). If the authors haven't done this experiment (despite doing many less critical experiments) it would be good to know why. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study. We have added a new panel (Fig 1E) addressing the requirements for flp-2 signaling on peroxide production in AIY. These results provide new mechanistic insight into how flp-2 impacts signaling in AIY and a new interpretation of these results has been added to the discussion.

      Results:

      “To address how flp-2 signaling regulates FLP-1 secretion from AIY, we examined H2O2 levels in AIY using a mitochondrially targeted pH-stable H2O2 sensor HyPer7 (mitoHyPer7, Pak et al. 2020). Mito-HyPer7 adopted a punctate pattern of fluorescence in AIY axons, and the average fluorescence intensity of axonal mito-HyPer7 puncta increased about two-fold following 10 minute juglone treatment (Fig 1E), in agreement with our previous studies using HyPer (Jia and Sieburth 2021), confirming that juglone rapidly increases mitochondrial AIY H2O2 levels. flp-2 mutations had no significant effects on the localization or the average intensity of mito-HyPer7 puncta in AIY axons either in the absence of juglone, or in the presence of juglone (Fig 1E), suggesting that flp-2 signaling promotes FLP-1 secretion by a mechanism that does not increase H2O2 levels in AIY. Consistent with this, intestinal overexpression of flp-_2 had no effect on FLP-1::Venus secretion in the absence of juglone, but significantly enhanced the ability of juglone to increase FLP-1 secretion (Fig. 1D). We conclude that both elevated mitochondrial H2O2 levels and intact _flp-2 signaling from the intestine are necessary to increase FLP-1 secretion from AIY.”

      More minor comments/suggestions: 

      Line 172: No justification is given as to why the authors chose to focus on flp-2 over the other potential candidates identified in their RNAi screen. 

      We are currently examining the other neuropeptide hits from the screen, but we have no additional phenotypes to report.

      Line 189: An explanation for the use of gDNA as opposed to cDNA should be given. 

      We have changed the text in the Results section as follows:

      “Expressing a flp-2 genomic DNA (gDNA), fragment (containing both the flp-2a and flp-2b isoforms that arise by alternative splicing), specifically in the nervous system failed to rescue the FLP-1::Venus defects of flp-2 mutants, whereas expressing flp-2 selectively in the intestine fully restored juglone-induced FLP1::Venus secretion to flp-2 mutants (Fig. 1D).”

      Line 249-253: nlp-40 and nlp-27 were not implicated in contributing to juglone toxicity in the RNAi screen performed previously by the authors, so it is unclear why both of these peptides are investigated beyond simply being released from the intestine. Confusingly, while Figure S2D shows no overlap between NLP-40 and FLP2, NLP-27 is omitted from the analysis. 

      We have clarified that these peptides are not implicated in stress responses, providing a clearer rational for why the serve as controls for specificity.

      “Third, nlp-40 and nlp-27 encode neuropeptide-like proteins that are released from the intestine, but are not implicated in stress responses (Liu et al. 2023; Taylor et al. 2021; Wang et al. 2013), and juglone treatment had no detectable effects on coelomocyte fluorescence in animals expressing intestinal NLP-40::Venus or NLP-27::Venus fusion proteins (Fig. S2B and C), and NLP40::mTur2 puncta did not overlap with FLP-2::Venus puncta in the intestine (Fig. S2D).”

      Line 262: A more detailed description of juglone's mechanism of action would be welcome here. Is juglone expected to act only in intestinal cells, or is its function more pervasive? 

      We have added more detail:

      “Juglone generates superoxide anion radicals (Ahmad and Suzuki 2019; Paulsen and Ljungman 2005) and juglone treatment of C. elegans increases ROS levels (de Castro, Hegi de Castro, and Johnson 2004) likely by promoting the global production of mitochondrial superoxide. Superoxide can then be rapidly converted into H2O2 by superoxide dismutase.”

      Line 414: Justification for why expulsion frequency is used here to quantify NLP-40 secretion is required, particularly because NLP-40::Venus was already used to quantify NLP-40 secretion via the coelomocyte fluorescence method in the experiments contributing to Figure S2. 

      We used expulsion frequency here because (1) it is an easier assay compared to the coelomocyte assay and (2) it is a functional assay. Defective NLP-40 exocytosis manifests as reduced exclusion frequency, therefore if NLP-40 secretion is defective in pkc-2 mutants, nlp-40 mutants should exhibit defects in expulsion frequency.

      We have clarified this point:

      “To determine whether pkc-2 can regulate the intestinal secretion of other peptides that are not associated with oxidative stress, we examined expulsion frequency, which is a measure of NLP-40 secretion (Mahoney et al. 2008; Wang et al. 2013).”

      Line 478: The discussion of neuronally-secreted kisspeptin in this context does not seem relevant as this paper has focused on intestinal peptide secretion. 

      We have removed this sentence:

      In mammals, release of the RF-amide neuropeptide kisspeptin from the anteroventral periventricular nucleus (AVPV) regulates reproduction by inducing the release of gonadotropins via its stimulatory action on GnRH neurons (Han et al. 2005).

      Line 526: DMSR-18 seems to be a typo. Possibly meant FRPR-8, as this is another FLP-2-activated GPCR identified in the screen (though notably, FRPR-8 is only activated by one of the two FLP-2 peptide products) On that note, DMSR-1 has two isoforms, and only one of them is activated by FLP-2 (and only one of the two FLP-2 peptides). This seems relevant to discuss. 

      We have corrected the text and we have added to the discussion the number of FLP-2 peptides:

      “In addition, certain FLP-2-derived peptides (of which there are at least three) can bind to the GPCRs DMSR-1, or FRPR-8 in transfected cells (Beets et al. 2023). Identifying the relevant FLP-2 peptide(s), the FLP-2 receptor and its site of action will help to define the circuit used by intestinal flp-2 to promote FLP-1 release from AIY.” 

      Line 534: An explanation or speculation into why this integration might be necessary would be welcome here. 

      We have edited this paragraph:

      “FLP-1 release from AIY is positively regulated by H2O2 generated from mitochondria (Jia and Sieburth 2021). Here we showed that H2O2-induced FLP-1 release requires intestinal flp-2 signaling. However, flp-2 does not appear to promote FLP-1 secretion by increasing H2O2 levels in AIY (Fig 1E), and flp-2 signaling is not sufficient to promote FLP-1 secretion in the absence of H2O2 (Fig. 1D). These results point to a model whereby at least two conditions must be met in order for AIY to increase FLP-1 secretion: an increase in H2O2 levels in AIY itself, and an increase in flp-2 signaling from the intestine. Thus AIY integrates stress signals from both the nervous system and the intestine to activate the intestinal antioxidant response through FLP-1 secretion. The requirement of signals from multiple tissues for FLP-1 secretion may function to limit the activation of SKN-1, since unregulated SKN-1 activation can be detrimental to organismal health (Turner, Ramos, and Curran 2024).”

      Line 569: Should specify what these candidates are. 

      There are 11 proteins with thioredoxin fold domains. We modified the sentence to list one of them.

      “There are several thioredoxin-domain containing proteins in addition to trx-3 in the C. elegans genome that could be candidates for this role (e.g. trx-5 and others).”

      Line 660: Details about whether the M9 control had an equivalent amount of DMSO as the juglone+M9 condition is required. 

      We have performed toxicity assay and neuropeptide release assays comparing M9 DMSO, and Juglone treatment and we have included this new data in Fig S1C, D and S2E. Methods: 

      “A stock solution of 50mM juglone in DMSO was freshly made on the same day of liquid toxicity assay. 120μM  working solution of juglone in M9 buffer was prepared using stock solution before treatment. Around 60-80 synchronized adult animals were transferred into a 1.5mL Eppendorf tube with fresh M9 buffer and washed three times, and a final wash was done with either the working solution of juglone with or M9  DMSO at the concentrations present in juglone-treated animals does not contribute to toxicity since DMSO treatment alone caused no significant change in survival compared to M9-treated controls (Fig. S1C).

      For coelomocyte imaging, L4 stage animals were transferred in fresh M9 buffer on a cover slide, washed six times with M9 before being exposed to 300μM juglone in M9 buffer (diluted from freshly made 50mM stock solution), 1mM H2O2 in M9 buffer, or M9 buffer. DMSO at the concentrations present in juglone-treated animals does not alter neuropeptide secretion since DMSO treatment alone caused no significant change in FLP-1::Venus or FLP-2::Venus coelomocyte fluorescence compared to M9-treated controls.  (Fig. S1D and S2E).”

      Line 1191: Should be FLP-1:Venus in AIY, not the intestine  

      Corrected.

      In general, the significance of reporting in the figures is very unclear. "a, b, c" to report statistical analysis is confusing in the figure legends, and also unnecessary when they denote non-significance. There are some cases where it is reported that a symbol (eg. ***) denotes statistical significance, but there is no indication of what level of statistical significance the symbol represents (for example, in Figures 2C and 2D) 

      Levels of significance was summarized in the end of legend for each figure unless indicated for specific symbols (for example Fig. 1C), we have edited this figure legend: 

      “E Representative images and quantification of fluorescence of matrix-targeted HyPer7 in the axon of AIY following M9 or juglone treatment for 10min. Arrowheads denote puncta marked by MLS::HyPer7 fusion proteins (Excitation: 500 and 400nm; emission: 520nm). Ratio of images taken with 500nM (GFP) and 400nM (CFP) for excitation was used to measure H2O2 levels. Unlined *** and ns denote statistical analysis compared to “wild type”. n = 25, 25, 25, 25 independent animals. Scale bar: 10μM.

      F Representative images and quantification of average fluorescence in the posterior region of transgenic animals expressing P_gst-4::gfp_ after 4h vehicle M9 or juglone exposure. Asterisks mark the intestinal region used for quantification. P_gst-4::gfp_ expression in the body wall muscles, which appears as fluorescence on the edge animals in some images, was not quantified. Unlined *** and ns denote statistical analysis compared to “wild type”; unlined ## and ### denotes statistical analysis compared to “wild type+juglone”. n = 25, 26, 25, 25, 25, 25, 25, 25 independent animals. Scale bar: 10μM.”

      Figure 2C: It is unclear which conditions have H2O2 treatment (as described in the legend). There is also no mention of what ### indicates. 

      Levels of significance for ### was summarized in the end of legend, No H2O2 treatment was performed in this assay, we have edited this figure legend: 

      “C. Representative images and quantification of average coelomocyte fluorescence of the indicated mutants expressing FLP-2::Venus fusion proteins in the intestine following M9 or juglone treatment for 10min. Unlined *** and ns denote statistical analysis compared to “wild type”. n = 29, 25, 24, 30, 23, 30, 25, 25, 25 independent animals. Scale bar: 5μM.”

      Figure 2D: It is not previously mentioned that M9 condition contains DMSO, as implied by the legend. 

      We have edited this figure legend:

      “D. Quantification of average coelomocyte fluorescence of transgenic animals expressing FLP-2::Venus fusion proteins in the intestine following treatment of fresh M9 buffer or the indicated stressors for 10min. Unlined *** denotes statistical analysis compared to “M9”. n = 23, 25, 25 independent animals.”  

      Figure 3J: The y-axis label should more clearly describe the ratio being measured. 

      We have updated the panel and this figure legend: 

      “J. Schematic, representative images and quantification of fluorescence in the posterior region of the indicated transgenic animals co-expressing mitochondrial matrix targeted HyPer7 (matrix-HyPer7) or mitochondrial outer membrane targeted HyPer7 (OMMHyPer7) with TOMM-20::mCherry following M9 juglone or H2O2 treatment. Ratio of images taken with 500nM (GFP) and 400nM (CFP) for excitation and 520nm for emission was used to measure H2O2 levels. Unlined *** and ns denote statistical analysis compared to “wild type; unlined ## denotes statistical analysis compared to “wild type+juglone”. (top) n = 20, 20, 18, 20, 19, 19, 20, 20 independent animals.

      (bottom) n = 20, 20, 19, 20, 20, 20, 20, 20 independent animals. Scale bar: 5μM.” 

      Figure S3A: *** is mislabelled. It should be a comparison to wildtype. 

      We have edited this figure legend: 

      “A. Quantification of average coelomocyte fluorescence of the indicated mutants expressing FLP-2::Venus fusion proteins in the intestine following M9 or juglone treatment for 10min. Unlined *** denotes statistical analysis compared to “wild type”; ### and ns denote statistical analysis compared to “wild type+juglone”. n = 29, 27, 29, 27, 25, 26, 24 independent animals.”  

      Reviewer #2 (Recommendations For The Authors): 

      (1) The localization experiments could benefit from the application of ultra-high-resolution fluorescence microscopy. This would allow for a more detailed analysis of the spatial distribution of SOD-1/3::GFP in relation to mitochondria-targeted TOMM-20::mCherry fusion proteins in the posterior intestinal region of transgenic animals. 

      We agree that high resolution microscopy would be a great way to more precisely localize SOD proteins relative to the mitochondria, and this would enhance understanding of the source of peroxide in this system. We do not conduct this type of microcopy in the lab, so this approach would require a collaboration with a lab that is set up for this. Thus we feel that this is beyond the scope of the current study.  

      (2) The paper may note the challenge of directly measuring mitochondrial H2O2 concentrations. However, advancements in chemical or fluorescent sensors for H2O2 detection within mitochondria could provide more direct evidence of its role in FLP-2 secretion. 

      We have considered using chemical sensors, but many are either not efficiently taken up by worms (the skin is largely impermeable to all but the most hydrophobic molecules), or they would label peroxide indiscriminately in all tissues making detection specifically in the intestine challenging. We have had good luck with genetically encoded peroxide sensors since they provide tissue specificity and good spatial resolution depending on where we target them. We have added imaging results for HyPer7 in the AIY neuron to Figure 1E. 

      Results:

      “To address how flp-2 signaling regulates FLP-1 secretion from AIY, we examined H2O2 levels in AIY using a mitochondrially targeted pH-stable H2O2 sensor HyPer7 (mitoHyPer7, Pak et al. 2020). Mito-HyPer7 adopted a punctate pattern of fluorescence in AIY axons, and the average fluorescence intensity of axonal mito-HyPer7 puncta increased about two-fold following 10 minute juglone treatment (Fig 1E), in agreement with our previous studies using HyPer (Jia and Sieburth 2021), confirming that juglone rapidly increases mitochondrial AIY H2O2 levels. flp-2 mutations had no significant effects on the localization or the average intensity of mito-HyPer7 puncta in AIY axons either in the absence of juglone, or in the presence of juglone (Fig 1E), suggesting that flp-2 signaling promotes FLP-1 secretion by a mechanism that does not increase H2O2 levels in AIY. Consistent with this, intestinal overexpression of flp-_2 had no effect on FLP-1::Venus secretion in the absence of juglone, but significantly enhanced the ability of juglone to increase FLP-1 secretion (Fig. 1D). We conclude that both elevated mitochondrial H2O2 levels and intact _flp-2 signaling from the intestine are necessary to increase FLP-1 secretion from AIY.” 

      (3) To confirm the activation of AIY neurons by FLP-2, measuring calcium activity in these neurons may be a robust approach. It would be beneficial to determine if synthetic FLP-2 can activate AIY neurons and subsequently induce an intestinal antioxidant response. 

      This is a great idea. We have begun to examine GCaMP fluorescence in AIY and we see responses to oxidative stressors. We think that this data is too preliminary at the moment to include here.  

      (4) The identification of the key receptors mediating the interaction between FLP-2 and AIY neurons, as well as the receptors in the gut that respond to FLP-1, would complete the signaling pathway and strengthen the study's conclusions. 

      We agree that this is an important question. Specifically, identifying the FLP-2 receptor and its site of action is a major priority. Since there are at least four different receptors that have been functionally or physically linked to FLP-2 and there are at least three FLP-2 peptides, unraveling the components acting directly downstream of FLP-2 will require further investigation that we feel is beyond the scope of this current study.  

      (5) Investigating whether direct manipulation of AIY neurons, through methods such as optogenetic activation or inhibition, can trigger the gut's antioxidant response would provide insight into the functional relevance of this neuronal activity. 

      Also an excellent idea. We previously published that Channelrhodopsin activation specifically in AIY indeed increases FLP-1 secretion, but we have not yet examined its effects on antioxidant responses in the intestine.  This may require a more sustained activation of AIY than Channelrhodopsin can provide.

      (6) For the analysis of intestinal Pges-1::GFP fluorescence, specifying the region of interest would enhance the precision of the data and the reproducibility of the results. 

      We analyze fluorescence intensity of a 16-pixel diameter circle in the posterior intestine (as indicated by the asterisks) and we have added this to the methods, we edited this paragraph:

      “or transcriptional reporter imaging, young adult animals with indicated genotype were transferred into a 1.5mL Eppendorf tube with M9 buffer, washed three times and incubated in M9 buffer or 60uM working solution of juglone for 1h in dark on rotating mixer before recovering on fresh NGM plates with OP50 for 3h in dark at 20°C. The posterior end of the intestine was imaged with the 60x objective and quantification for average fluorescence intensity of a 16-pixel diameter circle in the posterior intestine was calculated using Metamorph.”

      (7) Assessing the potential for pharmacological modulation of FLP-2 or H2O2 levels could provide valuable insights into therapeutic strategies aimed at enhancing the oxidative stress response. 

      Agreed.

      (8) For improved clarity, it is suggested that the schematic currently presented in Figure S1A be integrated into Figure 2C, as this would facilitate the reader's comprehension of the experimental design and findings. 

      Moved.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Choi and co-authors presents "P3 editing", which leverages dual-component guide RNAs (gRNA) to induce protein-protein proximity. They explore three strategies for leveraging prime-editing gRNA (pegRNA) as a dimerization module to create a molecular proximity sensor that drives genome editing, splitting a pegRNA into two parts (sgRNA and petRNA), inserting self-splicing ribozymes within pegRNA, and dividing pegRNA at the crRNA junction. Among these, splitting at the crRNA junction proved the most promising, achieving significant editing efficiency. They further demonstrated the ability to control genome editing via protein-protein interactions and small molecule inducers by designing RNA-based systems that form active gRNA complexes. This approach was also adaptable to other genome editing methods like base editing and ADAR-based RNA editing.

      Strengths:

      The study demonstrates significant advancements in leveraging guide RNA (gRNA) as a dimerization module for genome editing, showcasing its high specificity and versatility. By investigating three distinct strategies-splitting pegRNA into sgRNA and petRNA, inserting self-splicing ribozymes within the pegRNA, and dividing the pegRNA at the repeat junction-the researchers present a comprehensive approach to achieving molecular proximity and reconstituting function. Among these methods, splitting the pegRNA at the repeat junction emerged as the most promising, achieving editing efficiencies up to 76% of the control, highlighting its potential for further development in CRISPR-Cas9 systems. Additionally, the study extends genome editing control by linking protein-protein interactions to RNA-mediated editing, using specific protein-RNA interaction pairs to regulate editing through engineered protein proximity. This innovative approach expands the toolkit for precision genome editing, demonstrating the feasibility of controlling genome editing with enhanced specificity and efficiency.

      Weaknesses:

      The initial experiments with splitting the pegRNA into sgRNA and petRNA showed low editing efficiency, less than 2%. Similarly, inserting self-splicing ribozymes within pegRNA was inefficient, achieving under 2% editing efficiency in all constructs tested, possibly hindered by the prime editing enzyme. The editing efficiency of the crRNA and petracrRNA split at the repeat junction varied, with the most promising configurations only reaching 76% of the control efficiency. The RNA-RNA duplex formation's inefficiency might be due to the lack of additional protein binding, leading to potential degradation outside the Cas9-gRNA complex. Extending the approach to control genome editing via protein-protein interactions introduced complexity, with a significant trade-off between efficiency and specificity, necessitating further optimization. The strategy combining RADARS and P3 editing to control genome editing with specific RNA expression events exhibited high background levels of non-specific editing, indicating the need for improved specificity and reduced leaky expression. Moreover, P3 editing efficiencies are exclusively quantified after transfecting DNA into HEK cells, a strategy that has resulted in past reproducibility concerns for other technologies. Overall, the various methods and combinations require further optimization to enhance efficiency and specificity, especially when integrating multiple synthetic modules.

      Thank you for this accurate summary and assessment of the strengths and weaknesses of the P3 editing as it stands. Looking ahead, we agree that further optimizations will be important, as will characterizing the performance of P3 editing in additional cellular contexts. The revised Discussion (see below) now makes these points more clearly.

      Reviewer #2 (Public Review):

      Choi et al. describe a new approach for enabling input-specific CRISPR-based genome editing in cultured cells. While CRISPR-Cas9 is a broadly applied system across all of biology, one limitation is the difficulty in inducing genome editing based on cellular events. A prior study, from the same group, developed ENGRAM - which relies on activity-dependent transcription of a prime editing guide RNA, which records a specific cellular event as a given edit in a target DNA "tape". However, this approach is limited to the detection of induced transcription and does not enable the detection of broader molecular events including protein-protein interactions or exposure to small molecules. As an alternative, this study envisioned engineering the reconstitution of a split prime editing guide RNA (pegRNA) in a protein-protein interaction (PPI)-dependent manner. This would enable location- and content-specific genome editing in a controlled setting.

      The authors explored three different design possibilities for engineering a PPI-dependent split pegRNA. First, they tried splitting pegRNA into a functional sgRNA and corresponding prime editing transRNA, incorporating reverse-complementary dimerization sequences on each guide half. This approach, however, resulted in low editing efficiency across 7 different designs with various complementary annealing template lengths (<2% efficiency). They also tried inserting a self-splicing ribozyme within the pegRNA, which produces a functional pegRNA post-transcriptionally. The incorporation of a split-ribozyme, dependent on a PPI, could have been used to reconstitute the split pegRNA in an event-controlled manner. However again, only modest levels of editing were observed with the self-splicing ribozyme design (<2%). Finally, they tried splitting the pegRNA at the repeat:anti-repeat junction that was used to join the original dual-guide system comprised of a crRNA and tracrRNA, into a single-guide RNA. They incorporated the prime editing features into the tracrRNA half, to create petracrRNA. Dimerization was initially induced by different complementary RNA annealing sequences. Using this design, they were able to induce an editing efficiency of ~28% (compared to 37% efficiency using a positive control epegRNA guide).

      Having identified a suitable split pegRNA system, they next sought to induce the reconstitution of the two halves in a PPI-dependent manner. They replaced the complementary RNA annealing sequences with two different RNA aptamers (MS2 and BoxB). MS2 detects the MCP protein, while BoxB detects the LambdaN protein. Close proximity between MCP and LambdaN would thus bring together the two split pegRNA halves, creating a functional pegRNA that would enable prime editing at a specific target site. They demonstrated that they could induce MCP-BoxB proximity by fusing them to different dimerizing protein partners: 1) constitutive epitope-nanobody/antibody pairs such as scFv/GCN4 or NbALFA/ALFA-Tag; 2) split-GFP; or 3) chemically-induced protein pairs such as FKBP/FRB or ABI/PYL. For all of these approaches, they could achieve between ~20-60% normalized editing efficiency (relative to positive control editing levels with epegRNA). Additional mutation of the linkers between the RNA and aptamers could increase editing efficiency but also increase non-specific background editing even in the absence of an induced PPI.

      Additional applications of this overall strategy included incorporating the design with different DNA base editors, with the most promising examples shown with the base editors CBE4max and ABE8. It should be noted that these specific examples used a non-physiological LambdaN-MCP direct fusion protein as the "bait" that induced reconstitution of the two halves of the guideRNA, rather than relying on a true induced PPI. They also demonstrated that the recently reported RADARS strategy could be incorporated into their system. In this example, they used an ADAR-guide-RNA to drive the expression of a LambdaN-PCP fusion protein in the presence of a specific target RNA molecule, IL6. This induced LambdaN-PCP protein could then reconstitute the split peg-RNAs to drive prime editing. To enable this last application, they replaced the MS2 aptamer in their pegRNA with the PP7 aptamer that binds the PCP protein (this was to avoid crosstalk with RADARS, which also uses MS2/MCP interaction). Using this strategy, they observed a normalized editing efficiency of around 12% (but observed non-specific editing of around 8% in the absence of the target RNA).

      Strengths:

      The strengths of this paper include an interesting concept for engineering guide RNAs to enable activity-dependent genome editing in living cells in the future, based on discreet protein-protein interactions (either constitutively, spatially, or chemically induced). Important groundwork is laid down to engineer and improve these guide RNAs in the future (especially the work describing altering the linkers in Supplementary Figure 3 - which provides a path forward).

      Weaknesses:

      In its current state, the editing efficiency appears too low to be applied in physiological settings. Much of the latter work in the paper relies on a LambdaN-MCP direction fusion protein, rather than two interacting protein pairs. Further characterizations in the future, especially varying the transfection amounts/durations/etc of the various components of the system, would be beneficial to improve the system. It will also be important to demonstrate editing at additional sites; to characterize how long the PPI must be active to enable efficient prime editing; and how reversible the reconstitution of the split pegRNA is.

      Thank you for this assessment of the strengths and weaknesses of the P3 editing as it stands. Looking ahead, we agree that further optimizations will be important, including along the lines suggested by the reviewer, as will further characterization of the system with respect to dependencies, reversibility, etc. The revised Discussion (see below) now makes these points more clearly.

      Recommendations for the authors:

      Reviewing Editor comments:

      It would be helpful to better describe the nature of improvements (on-targeting and/or off-targeting) that would be needed to effectively use this approach in vitro and in vivo applications.

      We agree, and have accordingly revised the last paragraph of our discussion to better describe what improvements are needed for in vitro and in vivo applications:

      “In our view, there are four outstanding challenges for P3 editing to be broadly useful: evaluating additional cellular contexts, the method’s efficiency and specificity, understanding the limit of detectable protein-protein interactions, and the development of sensors compatible with multiplex P3 editing within the same cell. First, we have thus far only conducted P3 editing in HEK293T cells, and obviously needs to be tested in additional cell types. Second, both the efficiency and specificity of the P3 editing need to be improved before it can be used as a selective editing tool in model systems. We have explored how modifying the crRNA and petracrRNA pair sequences can tune the efficiency-vs-specificity tradeoff, but alternative avenues to improvement (e.g., better docking of RNA-aptamers such as MS2, BoxB, or PP7 by testing more linker sequences that place crRNA and petracrRNA for duplex formation) may be more fruitful in terms of achieving high efficiency and specificity at once (e.g., >50% editing in the setting of a specific protein-protein interaction, and <1% editing without it). Second, it is not clear whether weak and transient interactions among proteins can be used to trigger P3 editing. Assuming the genome editing complex formation is reversible, improving P3 editing efficiency may be able to capture different strengths of protein-protein interactions, although some interactions may be too transient to promote functional guide RNA formation. Finally, the current P3 editing design uses a pair of RNA aptamers and their corresponding protein binders, limiting the multiplex detection of protein-protein pairs. More orthogonal protein-RNA pairs need to be identified (e.g., using a massively parallel platform (Buenrostro et al., 2014) and/or computational prediction (Baek et al., 2023)) to allow for large numbers of P3 sensors for different protein-protein interactions to be deployed within the same cell. Overcoming these four challenges is necessary for P3 editing to be broadly useful for gating genome editing on physiological levels of specific protein-protein interactions in a multiplex fashion.”

      Reviewer #2 (Recommendations For The Authors):

      It does not appear that all plasmids necessary to reproduce the results of this paper have been deposited to addgene, but only a small subset. The authors might include that these plasmids are available upon request, if not uploaded to a public repository.

      We have added a statement that additional plasmids are available upon request. Our Data Availability Statement reads (with the added sentence underlined):

      “Raw sequencing data have been uploaded to Sequencing Read Archive (SRA) with the associated BioProject ID PRJNA1004865. The following plasmids have been deposited to Addgene: pU6-crRNA-MS2, pU6-BoxB-petracrRNA, pCMV-LambdaN-MCP, pCMV-LambdaN-NbALFA,  and pCMV-ALFA-MCP (Addgene ID 207624 - 207628). The rest of the plasmids used in this study are available upon request.”

      It could be useful to include somewhere why, specifically, editing the guide RNAs as opposed to the Cas9 itself is advantageous. Light-inducible split Cas9s have been engineered, and I imagine other PPI-inducible split Cas9s have also been engineered. A specific mention of the advantages of using engineered split pegRNAs could put the significance of this work in a better context.

      Thanks for raising this, and we agree. We have revised the first paragraph of the Results section to highlight why we think splitting the guide RNAs as opposed to Cas9 might be advantageous:

      “In the split architecture, the “dimerization module” is a key sensor component. Although strategies that split the protein component of the genome editing complex have been described (e.g., split-Cas9 (Yu et al., 2020)), we reasoned that having the guide RNA serve as the dimerization module rather than the protein, i.e. by splitting it into two parts, and making the restoration of its function dependent on a molecular proximity event, would afford even more control. For example, if multiple split gRNAs were present within the same cell, they could be independently controlled, whereas a split Cas9 would only allow a single control point.  In our initial experiments, we focused on splitting the pegRNA used in prime editing.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors test the "OHC-fluid-pump" hypothesis by assaying the rates of kainic acid dispersal both in quiet and in cochleae stimulated by sounds of different levels and spectral content. The main result is that sound (and thus, presumably, OHC contractions and expansions) results in faster transport along the duct. OHC involvement is corroborated using salicylate, which yielded results similar to silence. Especially interesting is the fact that some stimuli (e.g. tones) seem to provide better/faster pumping than others (e.g. noise), ostensibly due to the phase profile of the resulting cochlear traveling-wave response.

      Strengths:

      The experiments appear well controlled and the results are novel and interesting. Some elegant cochlear modeling that includes coupling between the organ of Corti and the surrounding fluid as well as advective flow supports the proposed mechanism.

      Weaknesses:

      It's not clear whether the effect size (e.g., the speed of sound-induced pumping relative to silence) is large enough to have important practical applications (e.g., for drug delivery). The authors should comment on the practical requirements and limitations.

      With our current data, what we can conclude is that modest sound levels (e.g., 75 dB SPL noise or an 80 dB SPL tone) facilitates cochlear drug delivery. We added a paragraph to the Discussion stating some future considerations for application to drug delivery in the human cochlea.

      Although helpful so far as it goes, the modeling could be taken much further to help understand some of the more interesting aspects of the data and to obtain testable predictions. In particular, the authors should systematically explore the level effects they find experimentally and determine whether the model can replicate the finding that different sounds produce different results (e.g. noise vs tone).

      The model should also be used to relate the model's flow rates more quantitatively to the properties of the traveling wave (e.g., its phase profile).

      The present study is focused on explaining the principle of mass transport in the cochlea. The quantification of the relationship between flow rate and traveling wave is an important open question and will be the topic of future studies. Our previous modeling study (Shokrian et al. 2020) showed a clear relation between the traveling wave characteristics (e.g., amplitude and phase velocity) and the mass transport in the Corti fluid. As the reviewer correctly pointed out, the current paper is focused on designing controlled experiments to provide proof of concept along with computational simulations to support our major claim (that outer hair cells stir cochlear fluid). 

      Finally, the model should be used to investigate differences between active and passive OHCs (e.g., simulating the salicylate experiment by disabling the model's OHCs).

      What the reviewer asks for has been demonstrated in previous theoretical studies (Lighthill, 1992; Edom, Obrist, Kleiser, 2014; Sumner, Reichenbach, 2021). In some of the previous studies, it was called the steady streaming. These studies are excellent examples because they simulated the sensitive cochlea (similar level of basilar membrane vibrations) but did not incorporate the Corti fluid peristalsis. Even without the peristaltic motion of the Corti tube, the basilar membrane-scala fluid interaction generated steady streaming (creepy fluid flow). However, the streaming velocity of cochlear models without active peristalsis along the Corti tube is about three orders of magnitude smaller than the active cochlea at a comparable level of basilar membrane vibrations. For example, the peak streaming speed was < 0.1 um/s at 80 dB SPL, and it took > 4 hours for particles to travel 1 mm. This speed is much slower than the particle transport speed due to pure diffusion (Sumner, Reichenbach, 2021).

      The manuscript would be stronger if the authors discussed ways to test their hypothesis that OHC motility serves a protective effect by pumping fluid. For example, do animals held in quiet after noise exposure (TTS) take longer to recover?

      We agree with the reviewer. The following statements were added to the Discussion section. “Our results have implications for cochlear fluid homeostasis. For example, future studies can test the hypothesis that an acoustically rich environment would be beneficial in maintaining healthy hearing as well as in recovering from transient hearing loss.”

      Reviewer #2 (Public review):

      Summary:

      Recent cochlear micromechanical measurements in living animals demonstrated outer hair celldriven broadband vibration of the reticular lamina that contradicts frequency-selective cochlear amplification. The authors hypothesized that motile outer hair cells can drive cochlear fluid circulation. This hypothesis was tested by observing the effects of acoustic stimuli and salicylate, an outer hair cell motility blocker, on kainic acid-induced changes in the cochlear nucleus activities. It was found that acoustic stimuli can reduce the latency of the kainic acid effect, and a low-frequency tone is more effective than broadband noise. Salicylate reduced the effect of acoustic stimuli on kainic acid-induced changes. The authors also developed a computational model to provide the physical basis for interpreting experimental results. It was concluded that experimental data and simulations coherently indicate that broadband outer hair cell action is for cochlear fluid circulation.

      Strengths:

      The major strengths of this study include its high significance and the combination of electrophysiological recording of the cochlear nucleus responses with computational modeling. Cochlear outer hair cells have been believed to be responsible for the exceptional sensitivity, sharp tuning, and huge dynamic range of mammalian hearing. Recent observation of the broadband reticular lamina vibration contradicts frequency-specific cochlear amplification. Moreover, there is no effective noninvasive approach to deliver the drugs or genes to the cochlea for treating sensorineural hearing loss, one of the most common auditory disorders. These important questions were addressed in this study by observing outer hair cells' roles in the cochlear transport of kainic acid. The well-established electrophysiological method for recording cochlear nucleus responses produced valuable new data, and the purposely developed computational model significantly enhanced the interpretation of the data.

      The authors successfully tested their hypothesis, and both the experimental and modeling results support the conclusion that active outer hair cells can drive cochlear fluid circulation in the living cochlea.

      Findings from this study will help auditory scientists understand how the outer hair cells contribute to cochlear amplification and normal hearing.

      We thank the reviewer for acknowledging our effort.

      Weaknesses:

      While the statement "The present study provides new insights into the nonselective outer hair cell action (in the second paragraph of Discussion)" is well supported by the results, the authors should consider providing a prediction or speculation of how this hair cell action enhances cochlear sensitivity. Such discussion would help the readers better understand the significance of the current work.

      We added a potential implication to the Discussion, that an acoustically rich environment could be beneficial in maintaining healthy hearing as well as recovering from damaged hearing.

      Reviewer #3 (Public review):

      Summary:

      This study reveals that sound exposure enhances drug delivery to the cochlea through the nonselective action of outer hair cells. The efficiency of sound-facilitated drug delivery is reduced when outer hair cell motility is inhibited. Additionally, low-frequency tones were found to be more effective than broadband noise for targeting substances to the cochlear apex. Computational model simulations support these findings.

      Strengths:

      The study provides compelling evidence that the broad action of outer hair cells is crucial for cochlear fluid circulation, offering a novel perspective on their function beyond frequency-selective amplification. Furthermore, these results could offer potential strategies for targeting and optimizing drug delivery throughout the cochlear spiral.

      Weaknesses:

      The primary weakness of this paper lies in the surgical procedure used for drug administration through the round window. Opening the cochlea can alter intracochlear pressure and disrupt the traveling wave from sound, a key factor influencing outer hair cell activity. However, the authors do not provide sufficient details on how they managed this issue during surgery. Additionally, the introduction section needs further development to better explain the background and emphasize the significance of the work.

      Although we wrote that the inner ear left intact, it might have not been sufficiently clear. Our surgical approach leaves the inner ear intact, including the round-window membrane. The round window in gerbil is concave like a bowl. We applied 4 µL of kainic acid solution in the round-window niche, without perforating the round-window membrane. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations for the authors):

      The authors' choice to frame their findings by hinting that they have discovered the "real" reason for the evolution of broadband OHC electromotility (e.g., the first and last sentences of the abstract and parts of the Discussion), although clearly intended to boost the perceived significance of the work, does them no favors and will probably lead to distracting criticisms they could easily have avoided. The manuscript would be significantly improved by removing or downplaying these rather speculative and unsupported claims; the work stands on its own without them.

      We agree that the first line of the Abstract might distract the readers. Meanwhile, in the Discussion, we believe the readers will appreciate our speculation of how this study is relevant to recent debates on hearing mechanics. Following the reviewer’s advice, we have revised the Abstract.

      Reviewer #3 (Recommendations for the authors):

      Please review the detailed comments below. I hope they contribute to enhancing the paper:

      We thank the reviewer for this detailed advice. All of these comments make good sense and were very helpful in improving this paper or in planning future studies. 

      Many of the comments were relevant to the computer model, and they have one common basis, which we have not yet achieved. I.e., simulating the level-dependence. 

      I. Introduction

      (1) Please clarify and improve this sentence. Effective and safe strategies for delivering treatments to the inner ear have been reported: 'Consequently, intervening in hearing health by delivering substances to the inner-ear fluid is challenging'.

      The preceding statement is regarding the blood-labyrinthine barrier (BLB), comparable to the bloodbrain barrier (BBB). We revised the statement: “Consequently, intervening in hearing health by delivering substances to the inner-ear fluid through systemic circulation is challenging.”

      (2) Please expand on how the secretion and absorption of ions and molecules maintain the unique ionic compositions of the two intracochlear fluids. Include details on the role of the stria vascularis and the specific functions of the three types of strial cells in this process.

      In response to this request, we added a paragraph discussing cochlear fluid homeostasis. Our study is different from existing homeostasis studies in three regards. First, the site: Existing studies are centered on the stria vascularis, while this study concerns the Corti fluid. Second, the mechanism: Existing studies are regarding metabolic transport, while our scope is the transport due to fluid flow. Third, the range: Existing studies considered local electrochemical equilibrium within a radial section, while this study concerns global (longitudinal) mass transport. To address this comment, the following was added to the Discussion.

      “Our study complements existing studies regarding cochlear fluid homeostasis and differs from previous studies in several ways. The intrastrial fluids (extracellular fluids in the stria vascularis) have been more thoroughly investigated because the three layers in the stria vascularis (marginal, intermediate, and basal cells) maintain the endocochlear potential (Wangemann 2006).

      Equilibrium in the Corti fluid has been sparsely investigated because its electrochemical gradient is modest compared to that of the intrastrial fluids (Johnstone, Patuzzi et al. 1989; Zidanic and Brownell 1990). Local electrochemical balance in the cochlear fluids has been considered within a radial section (Quraishi and Raphael 2008; Patuzzi 2011; Nin, Hibino et al. 2012). Our study is focused on the longitudinal (global) equilibrium along the cochlear coil and did not consider the equilibrium across the stria vascularis cell layers. To examine whether the longitudinal fluid flow driven by outer hair cells is strong enough to affect cochlear fluid homeostasis, future studies should measure the K+ equilibrium and recycling along the length of the Corti fluid under sound and silence conditions.“

      (3) Please provide a more detailed explanation and definition of a longitudinal electrochemical gradient, including how it functions and its relevance in physiological processes.

      The most researched electrochemical gradient of the cochlea must be the endocochlear potential that varies along the cochlear length. The endocochlear potential at any location is determined by the equilibrium between the source and the sink. In the view of the Corti fluid, the source is the potassium current out of the hair cells and the sink is the resorption of potassium by supporting cells. The effect of a longitudinal electrochemical gradient on hearing physiology is beyond the scope of this study. To do so would require incorporating detailed K+ equilibrium dynamics. This certainly is one of our future directions. 

      (4) Please include the necessary references to support these three sentences: "Diffusion is an effective mechanism for a substance to travel along submicrometer distances. For instance, it takes microseconds for neurotransmitters to diffuse across a 20-nm synaptic gap. In contrast, diffusion is inefficient for travel on the centimeter scale. It takes days for a drug applied at the round window to travel 30 mm to the apical end of the human cochlea. In practice, the substance would not reach the apex because it would be resorbed before traveling the distance".   

      A reference was added (Berg, 1993). Our description of diffusion is based on the fundamental physics of Fick’s laws.

      (5) In paragraph 3, the author only discussed a portion of the previous approaches. There are numerous methods for inner ear delivery, including external, middle ear, and direct inner ear delivery via the round window or semicircular canal. Each method has its pros and cons, which the authors should carefully address. For example, the semicircular canal approach doesn't require two perforations in the inner ear and distributes the injection evenly throughout the cochlea.  

      A recent review paper regarding inner ear drug delivery was added as a reference (Szeto, Chiang et al. 2020). Drug delivery is a means to demonstrate the OHC’s role in longitudinal mass transport. We are concerned that comparing different drug delivery modalities in detail would distract the readers from the main point of this study. We mentioned ‘one remedy’ with two perforations, for which abundant case studies are found in the literature. Discussing existing approaches exhaustively can be better done by review papers.

      (6) The following sentence is inaccurate and should be carefully rephrased. Previous reports chose higher volumes than the actual fluid volume to maximize the drug (or gene) effect, but this was not a requirement of the delivery methods: 'Such an invasive approach requires the injection of a substantial fluid volume, larger than the entire perilymph in the inner ear'.

      We revised the statement to relax the wording ‘require’: ‘Such an invasive approach is often associated with the injection of a substantial fluid volume, larger than the entire perilymph in the inner ear (Szeto, Chiang et al. 2020)'. This statement might be acceptable because we found few invasive delivery papers that used < 1 µL. Moreover, the physics basis of the injection method is to replace the fluid in a labyrinth compartment with a new fluid (a good example where this fluid physics was tested with quantitative data is the Lichtenhan et al. 2016 paper).

      (7) Please provide the necessary references. Also, clarify what is meant by 'actuator cells'. Are you referring to hair cells?: 'The tube-shaped organ of Corti (OoC) is lined with actuator cells and the cells are activated systematically with a large phase velocity (> a few m/s) toward the apex'.

      Yes, we meant OHCs as the actuator cells. This point has been clarified. A reference for the phase velocity has been added (Olson, Duifhuis, Steele, 2012).

      II. Results

      (1) Is there a specific reason you use 60 or 75 dB SPL for broadband sounds, but opt for louder sounds (80 dB SPL) for pure tones?

      It is not straightforward to compare the SPL between broadband noise and a pure tone, and we did not attempt to ‘equate’ them in any way. 

      (2) Please provide specific details about the sound generation protocol, including the duration, start time, end time, and any other relevant parameters. Here is an example of a vague sentence. Do you play the sounds continuously during these time periods, or only at specific intervals?: 'In two example cases, the effect time at low-CF locations (CFs near 2 kHz) was 15 minutes for the case of the 0.5 kHz tone (Fig. 3A)'

      It is described in the Measurement protocol part of the Methods section (see the red text below). In the exampled case and all other cases, the sounds were played continually (not continuously).

      For the “Sound” protocol, 1.1-s noise pips (60 or 75 dB SPL, 0.1-12 kHz bandwidth, 0.8-s duration including 0.15-s onset/offset ramps) were presented continually. After 48 noise pips, one 1.1-s silent pause and three CF tone pips followed (a total of 51 pips and a pause make a 57.2-s sequence). The CF tone pips were presented at the level of 35 dB SPL to monitor neural responses. The silence pause was to monitor spontaneous neural responses. The sequence was repeated until neural signals at the lowest CF site were completely abolished. The neural responses presented in this study are the ‘driven responses’ obtained by subtracting the spontaneous responses from the responses to the 35 dB CF tones. For the “Silence” or “Pure-tone” protocol, the noise pips of the Sound protocol were replaced with either silence pauses or a pure tone at 80 dB SPL.

      (3) Providing a schematic timeline of your experiments indicating sound generation, kainic acid (and salicylate) application, as well as DPOAE and AVCN recordings would greatly help in understanding and following your results.

      We have revised Figure 2.

      (4) How did you control the opening(s) for the injection? The openings could alter intracochlear pressure and affect the traveling wave from the sound, which is the major factor influencing outer hair cell activity.

      We did not open the inner ear. The round window remained intact. Opening the bulla does not affect the intracochlear pressure. We have clarified this issue, beginning with the first sentence of the Abstract. Thanks for raising this important question.

      (5) Is there any reason why the author generated only low and mid-frequencies? If so, please address what the limitations were in testing high frequency.

      There are no limitations to testing high frequencies. High frequencies would not affect drug delivery to the apex of the cochlea because the traveling waves stop right after the CF location. We are interested in delivering drugs deeper into the apex. Our presented results support this reasoning: mid-frequency stimulation was less effective for delivery to the low CF location.

      (6) I suggest combining Figures 3E and 3F to facilitate a direct comparison between the Silence and Noise conditions, as the MF and LF plots are overlapping in these panels.

      We considered this change but realized that it might introduce confusion and difficulty in parsing the results. Moreover, the two panels have their respective messages. 

      (7) In Figure 3E, why does the LF tone affect both Low and Mid CFs, while the MF tone only affects Mid CF?

      The cochlear traveling wave stops right after the CF location. Peristaltic action takes place in the broad tail region of the traveling waves (see Fig. 5C).

      III. Materials and Methods

      (1) Please provide details about your injection protocol. Did you create additional perforations? How did you target the round window? What was the injection rate? How did you seal the round window, and so on?

      The inner ear including the round window was left intact. Only the bulla was open.

      (2) Please include details about your surgical procedure for the AVCN recording, including probe insertion.

      AVCN recording is a well-established technique. Instead of reintroducing the method, we added a classical reference with friendlier description (Frisina, Chamberlain, et al., 1982). 

      IV. Minor points

      (1) Please include the full terms for the abbreviations 'CF', 'DPOAEs', 'PT', 'IP', and 'RW' for readers who are not in the hearing research field.

      We have checked that these abbreviations were defined.

      (2) Are 'GXXX's in figures animal identifiers? Please clarify what they represent.

      Yes, they are animal identifiers. We have clarified this point in Fig. 1 caption.

    1. Author response:

      In response to your comments, we will revise our manuscript to address the limitations raised, including our ability to rigorously test how observed changes in gene expression in shrews are adaptive. The phylogenetic ANOVA we use (EVE), tests for a separate RNA expression optimum specific to the shrew lineage for each gene, and is consistent with expectations for adaptive evolution of gene expression. However, as you noted, while this analysis highlights many candidate genes potentially under positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. We will emphasize that inferred adaptive expression of these genes is putative in our discussion and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We will also refine our discussion to focus more on pathway-level analyses rather than on individual genes.

      We recognize that our methodological choices may not have been fully transparent, such as our selection of gene expression clusters for the pathway enrichment analysis and our focus on BCL2L1 for functional validation in cell lines. We will expand on these decisions in the methods section to provide greater clarity for our readers.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio when possible. EVE models handle the effect of sex on gene expression as intraspecific variation, reflective of plasticity. In shrews, however, we used males exclusively. Females were only found among juvenile individuals and including them would have introduced developmental variation with larger, negative impacts on these results. For the seasonal data, we will now include sex as a covariate in differential expression analyses, however, our design is imbalanced in relation to sex. We will account for this limitation and discuss it further in the revised manuscript.

    1. Author response:

      We sincerely thank you for your constructive and insightful feedback on our manuscript, including the assessment of its strengths and suggestions for improvements. This will allow us to enhance the clarity and impact of our work. In our revised manuscript, we will address your recommendations as follows:

      (1) Disambiguating whether the joystick eccentricity reflects the subject’s confidence or simply the perceived stimulus strength or coherence

      We agree that this is a pivotal issue for the interpretation of our results. We are confident that the joystick “eccentricity” (i.e., radial joystick deviation from the center) does not simply correlate with the moment-to-moment fluctuations of stimulus coherence. The observations that the radial joystick response varied considerably more than the stimulus fluctuations within each subject and each coherence level, and the analysis of metacognitive sensitivity, suggest that subjects indeed incorporated confidence judgements into their continuous reports. As proposed, we will further explore the established signatures of metacognitive confidence reports, and we will quantify the motion energy fluctuations within time intervals where the nominal stimulus parameters remained constant, to examine whether accuracy and confidence levels vary in response to these fluctuations. This approach will provide deeper insights into continuous dynamics within our paradigm.

      (2) Rationale for Social Investigation

      We will clarify the rationale and methodology of the social aspects in our experiments to better contextualize our approach and findings and their relationship to the field of collective decision-making. In particular, we will further emphasize that while our paradigm indeed did not impose integrating the information from the partner and did not involve incentives for collectively solving the task, the participants could (and did) incorporate the social information into their judgements and mostly improved their earnings. In this way, our approach complements the studies that required joint decisions.

      (3) Streamlining and Terminology

      We will streamline the text and figure legends to present our main arguments more concisely and improve the overall flow of the manuscript. Additionally, we will include a glossary to the main text to clarify terminology, enhancing accessibility and ensuring consistent understanding of key terms throughout the paper.

      To clarify two of the points upfront, we indeed used the term “eccentricity” not in a visual science sense but as the measure of radial joystick deviation from the center and the corresponding angular width of the response arc; we now realize that this is confusing in the context of visual psychophysics paper and will use another word. The term “dyadic” was meant to describe the experimental condition when two participants worked on the task, and associated measures of performance in this condition. The “dyadic score”, defined as the average score across the two participants in the dyadic condition, will be renamed as “combined score”.  

      (4) Incorporation of Additional Literature

      We acknowledge and appreciate the recommendations for additional relevant literature, which we will incorporate into our discussion. This will allow us to contextualize our findings more thoroughly within the existing body of research and highlight the broader implications of our work.

    1. Author response:

      eLife Assessment

      This valuable study uses consensus-independent component analysis to highlight transcriptional components (TC) in high-grade serous ovarian cancers (HGSOC). The study presents a convincing preliminary finding by identifying a TC linked to synaptic signaling that is associated with shorter overall survival in HGSOC patients, highlighting the potential role of neuronal interactions in the tumor microenvironment. This finding is corroborated by comparing spatially resolved transcriptomics in a small-scale study; a weakness is in being descriptive, non-mechanistic, and requiring experimental validation.

      We sincerely thank the editors for the valuable and constructive feedback. We appreciate the recognition of our findings and the significance of identifying transcriptional components in high-grade serous ovarian cancers. We acknowledge the insightful point on our study's descriptive nature and limited mechanistic depth. While further experimental validation would indeed enhance our conclusions, such work extends beyond the current scope of this manuscript. However, we would like to highlight that mechanistic studies demonstrating the impact of tumor-infiltrating nerves on disease progression are emerging (Zahalka et al., 2017; Allen et al., 2018; Balood et al., 2022; Jin et al., 2022; Globig et al., 2023; Restaino et al., 2023; Darragh et al., 2024). Importantly, members of our group have contributed to these findings. These studies, including in vitro and in vivo work in head and neck squamous cell carcinoma as well as high-grade serous ovarian carcinoma, demonstrate that substance P released from tumor-infiltrating nociceptors potentiates MAP kinase signaling in cancer cells, thereby influencing disease progression. This effect can be mitigated in vivo by blocking the substance P receptor (Restaino et al., 2023). Our present work identifies a transcriptional component that aligns with the presence of functional nerves within malignancies. These published mechanistic studies support our findings and suggest that this transcriptional component could serve as a potential screening tool to identify innervated tumors. Such information is clinically relevant, as patients with innervated tumors may benefit from more aggressive therapy.

      Reviewer #1 (Public review):

      This manuscript explores the transcriptional landscape of high-grade serous ovarian cancer (HGSOC) using consensus-independent component analysis (c-ICA) to identify transcriptional components (TCs) associated with patient outcomes. The study analyzes 678 HGSOC transcriptomes, supplemented with 447 transcriptomes from other ovarian cancer types and noncancerous tissues. By identifying 374 TCs, the authors aim to uncover subtle transcriptional patterns that could serve as novel drug targets. Notably, a transcriptional component linked to synaptic signaling was associated with shorter overall survival (OS) in patients, suggesting a potential role for neuronal interactions in the tumor microenvironment. Given notable weaknesses like lack of validation cohort or validation using another platform (other than the 11 samples with ST), the data is considered highly descriptive and preliminary.

      Strengths:

      (1) Innovative Methodology:

      The use of c-ICA to dissect bulk transcriptomes into independent components is a novel approach that allows for the identification of subtle transcriptional patterns that may be overshadowed in traditional analyses.

      We sincerely thank the reviewer for recognizing the strengths and novelty of our study. We appreciate the positive feedback on our use of consensus-independent component analysis (c-ICA) to decompose bulk transcriptomes, which we believe allowed us to detect subtle transcriptional signals often overlooked in traditional analyses.

      (2) Comprehensive Data Integration:

      The study integrates a large dataset from multiple public repositories, enhancing the robustness of the findings. The inclusion of spatially resolved transcriptomes adds a valuable dimension to the analysis.

      Thank you for recognizing the robustness of our study through comprehensive data integration. We appreciate the acknowledgment of our efforts to leverage a large, multi-source dataset, as well as the additional insights gained from spatially resolved transcriptomes. We believe this integrative approach enhances the depth of our analysis and contributes to a more nuanced understanding of the tumor microenvironment.

      (3) Clinical Relevance:

      The identification of a synaptic signaling-related TC associated with poor prognosis highlights a potential new avenue for therapeutic intervention, emphasizing the role of the tumor microenvironment in cancer progression.

      We appreciate the reviewer’s recognition of the clinical implications of our findings. The identification of a synaptic signaling-related transcriptional component associated with poor prognosis underscores the potential for novel therapeutic targets within the tumor microenvironment. We agree that this insight could open new avenues for intervention and further highlights the role of neuronal interactions in cancer progression.

      Weaknesses:

      (1) Mechanistic Insights:

      While the study identifies TCs associated with survival, it provides limited mechanistic insights into how these components influence cancer progression. Further experimental validation is necessary to elucidate the underlying biological processes.

      We appreciate the reviewer’s point regarding the limited mechanistic insights provided in our study. We agree that further experimental validation would enhance our understanding of how the biology captured by these transcriptional components influence cancer progression. However, we respectfully note that such validation is beyond the current scope of this article.   Our current analyses are done on publicly available expression array and spatial transcriptomic array datasets. For future studies, we therefore intend to combine spatial transcriptomic data with immunohistochemical analysis of the same tumors for validation purposes. We have started with setting up in vitro cocultures of neurons and ovarian cancer cells to obtain mechanistic insight in how genes with a large weight in TC121 regulate synaptic signaling and how that affects ovarian cancer cells.

      (2) Generalizability:

      The findings are primarily based on transcriptomic data from HGSOC. It remains unclear how these results apply to other subtypes of ovarian cancer or different cancer types.

      In Figure 5, we present the activity of TC121 across various cancer types, demonstrating broader applicability. However, due to limited treatment response data, we were unable to assess associations between TC activity scores and patient response. Additionally, transcriptomic and survival data specific to other ovarian cancer subtypes beyond HGSOC are currently not available, limiting our ability to generalize these findings to those groups. We intend to leverage survival data from TCGA to explore associations between TC activity scores and overall survival of patients with other cancer types. Nonetheless, we recognize limitations with TCGA survival data, as outlined in this article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8726696/.

      (3) Innovative Methodology:

      Requires more validation using different platforms (IHC) to validate the performance of this bulk-derived data. Also, the lack of control over data quality is a concern.

      We acknowledge the reviewer’s suggestion to validate our results with alternative platforms, such as IHC; however, we regret that such validation is beyond the scope of this article. Regarding data quality control, we implemented a series of checks:

      • Bulk Transcriptional Profiles: We applied principal component analysis (PCA) on the sample Pearson product-moment correlation matrix, focusing on the first principal component (PCqc), which accounted for approximately 80-90% of the variance, primarily reflecting technical rather than biological variability  (Bhattacharya et al., 2020). Samples with a correlation below 0.8 with PCqc were removed as outliers. Additionally, we generated unique MD5 hashes for each CEL file to identify and exclude duplicate samples. Per gene, expression values were standardized to a mean of zero and a variance of one across the GEO, CCLE, GDSC, and TCGA datasets to minimize probeset- or gene-specific variability.

      • Spatial Transcriptional Profiles: We used PCA for quality control here as well, retained samples only if their loading factors for the first principal component showed consistent signs across all profiles (i.e., all profiles had either positive or negative loading factors for the first PC) from that individual spatial transcriptomic sample. Samples that did not meet this criterion were excluded from analyses.

      (4) Clinical Application:

      Although the study suggests potential drug targets, the translation of these findings into clinical practice is not addressed. Probably given the lack of some QA/QC procedures it'll be hard to translate these results. Future studies should focus on validating these targets in clinical settings.

      While this study is exploratory in nature, we agree that future studies should focus on validating these potential drug targets in clinical settings. As suggested, QA/QC procedures were integral to our analyses. We applied rigorous quality control, including PCA-based checks and duplicate removal across datasets, to ensure data integrity (detailed in our previous response).

      In terms of clinical application, which we partially discussed in the manuscript, we will discuss additional strategies to prevent synaptic signaling and neurotransmitter release in the tumor microenvironment (TME). Drugs such as ifenprodil and lamotrigine are used in treating neuronal disorders to block glutamate release responsible for subsequent synaptic signaling, whereas the vesicular monoamine transporter (VMAT) inhibitor reserpine can block the formation of synaptic vesicles (Reid et al., 2013; Williams et al., 2001). Previous in vitro studies with HGSOC cell lines showed a significant effect of ifenprodil alone on cancer cell proliferation, whereas reserpine seemed to trigger apoptosis in cancer cells (North et al., 2015; Ramamoorthy et al., 2019). Such strategies could potentially be used to inhibit synaptic neurotransmission in the TME.

      Reviewer #2 (Public review):

      Summary:

      Consensus-independent component analysis and closely related methods have previously been used to reveal components of transcriptomic data that are not captured by principal component or gene-gene coexpression analyses.

      Here, the authors asked whether applying consensus-independent component analysis (c-ICA) to published high-grade serous ovarian cancer (HGSOC) microarray-based transcriptomes would reveal subtle transcriptional patterns that are not captured by existing molecular omics classifications of HGSOC.

      Statistical associations of these (hitherto masked) transcriptional components with prognostic outcomes in HGSOC could lead to additional insights into underlying mechanisms and, coupled with corroborating evidence from spatial transcriptomics, are proposed for further investigation.

      This approach is complementary to existing transcriptomics classifications of HGSOC.

      The authors have previously applied the same approach in colorectal carcinoma (Knapen et al. (2024) Commun. Med).

      Strengths:

      (1) Overall, this study describes a solid data-driven description of c-ICA-derived transcriptional components that the authors identified in HGSOC microarray transcriptomics data, supported by detailed methods and supplementary documentation.

      We thank the reviewer for acknowledging the strength of our data-driven approach and the use of consensus-independent component analysis (c-ICA) to identify transcriptional components within HGSOC microarray data. We aimed to provide comprehensive methodological detail and supplementary documentation to support the reproducibility and robustness of our findings. We believe this approach allows for the identification of subtle transcriptional signals that might be overlooked by traditional analysis methods.

      (2) The biological interpretation of transcriptional components is convincing based on (data-driven) permutation analysis and a suite of analyses of association with copy-number, gene sets, and prognostic outcomes.

      We appreciate the reviewer’s positive feedback on the biological interpretation of our transcriptional components. We are pleased that our approach, which includes data-driven permutation testing and analyses of associations with copy-number alterations, gene sets, and prognostic outcomes, was found convincing. These analyses were integral to enhancing the robustness and biological relevance of our findings.

      (3) The resulting annotated transcriptional components have been made available in a searchable online format.

      Thank you for acknowledging the availability of our annotated transcriptional components in a searchable online format.

      (4) For the highlighted transcriptional component which has been annotated as related to synaptic signalling, the detection of the transcriptional component among 11 published spatial transcriptomics samples from ovarian cancers appears to support this preliminary finding and requires further mechanistic follow-up.

      Thank you for acknowledging the accessibility of our annotated transcriptional components. We prioritized making these data available in a searchable online format to facilitate further research and enable the community to explore and validate our findings.

      Weaknesses:

      (1) This study has not explicitly compared the c-ICA transcriptional components to the existing reported transcriptional landscape and classifications for ovarian cancers (e.g. Smith et al Nat Comms 2023; TCGA Nature 2011; Engqvist et al Sci Rep 2020) which would enable a further assessment of the additional contribution of c-ICA - whether the c-ICA approach captured entirely complementary components, or whether some components are correlated with the existing reported ovarian transcriptomic classifications.

      We appreciate the reviewer’s insightful suggestion to compare our c-ICA-derived transcriptional components with previously reported ovarian cancer classifications, such as those from Smith et al. (2023), TCGA (2011), and Engqvist et al. (2020). To address this, we will incorporate analyses comparing the activity scores of our transcriptional components with these published landscapes and classifications, particularly focusing on any associations with overall survival. Additionally, we plan to evaluate correlations between gene signatures from these studies and our identified TCs, enhancing our understanding of the unique contributions of the c-ICA approach.

      (2) Here, the authors primarily interpret the c-ICA transcriptional components as a deconvolution of bulk transcriptomics due to the presence of cells from tumour cells and the tumour microenvironment. However, c-ICA is not explicitly a deconvolution method with respect to cell types: the transcriptional components do not necessarily correspond to distinct cell types, and may reflect differential dysregulation within a cell type. This application of c-ICA for the purpose of data-driven deconvolution of cell populations is distinct from other deconvolution methods that explicitly use a prior cell signature matrix.

      Thank you for highlighting this nuanced aspect of c-ICA interpretation. We acknowledge that c-ICA, unlike traditional deconvolution methods, is not specifically designed for cell-type deconvolution and does not rely on a predefined cell signature matrix. While we explored the transcriptional components in the context of tumor and microenvironmental interactions, we agree that these components may not correspond directly to distinct cell types but rather reflect complex patterns of dysregulation, potentially within individual cell populations.

      Our goal with c-ICA was to uncover hidden transcriptional patterns possibly influenced by cellular heterogeneity. However, we recognize these patterns may also arise from regulatory processes within a single cell type. To investigate further, we plan to use single-cell transcriptional data (~60,000 cell-types annotated profiles from GSE158722) and project our transcriptional components onto these profiles to obtain activity scores, allowing us to assess each TC’s behavior across diverse cellular contexts after removing the first principal component to minimize background effects.

      References

      Allen JK, Armaiz-Pena GN, Nagaraja AS, Sadaoui NC, Ortiz T, Dood R, Ozcan M, Herder DM, Haemerrle M, Gharpure KM, Rupaimoole R, Previs R, Wu SY, Pradeep S, Xu X, Han HD, Zand B, Dalton HJ, Taylor M, Hu W, Bottsford-Miller J, Moreno-Smith M, Kang Y, Mangala LS, Rodriguez-Aguayo C, Sehgal V, Spaeth EL, Ram PT, Wong ST, Marini FC, Lopez-Berestein G, Cole SW, Lutgendorf SK, diBiasi M, Sood AK. 2018. Sustained adrenergic signaling promotes intratumoral innervation through BDNF induction. Cancer Res 78:canres.1701.2016.

      Balood M, Ahmadi M, Eichwald T, Ahmadi A, Majdoubi A, Roversi Karine, Roversi Katiane, Lucido CT, Restaino AC, Huang S, Ji L, Huang K-C, Semerena E, Thomas SC, Trevino AE, Merrison H, Parrin A, Doyle B, Vermeer DW, Spanos WC, Williamson CS, Seehus CR, Foster SL, Dai H, Shu CJ, Rangachari M, Thibodeau J, Rincon SVD, Drapkin R, Rafei M, Ghasemlou N, Vermeer PD, Woolf CJ, Talbot S. 2022. Nociceptor neurons affect cancer immunosurveillance. Nature 611:405–412.

      Bhattacharya A, Bense RD, Urzúa-Traslaviña CG, Vries EGE de, Vugt MATM van, Fehrmann RSN. 2020. Transcriptional effects of copy number alterations in a large set of human cancers. Nat Commun 11:715.

      Darragh LB, Nguyen A, Pham TT, Idlett-Ali S, Knitz MW, Gadwa J, Bukkapatnam S, Corbo S, Olimpo NA, Nguyen D, Court BV, Neupert B, Yu J, Ross RB, Corbisiero M, Abdelazeem KNM, Maroney SP, Galindo DC, Mukdad L, Saviola A, Joshi M, White R, Alhiyari Y, Samedi V, Bokhoven AV, John MSt, Karam SD. 2024. Sensory nerve release of CGRP increases tumor growth in HNSCC by suppressing TILs. Med 5:254-270.e8.

      Globig A-M, Zhao S, Roginsky J, Maltez VI, Guiza J, Avina-Ochoa N, Heeg M, Hoffmann FA, Chaudhary O, Wang J, Senturk G, Chen D, O’Connor C, Pfaff S, Germain RN, Schalper KA, Emu B, Kaech SM. 2023. The β1-adrenergic receptor links sympathetic nerves to T cell exhaustion. Nature 622:383–392.

      Jin M, Wang Y, Zhou T, Li W, Wen Q. 2022. Norepinephrine/β2-adrenergic receptor pathway promotes the cell proliferation and nerve growth factor production in triple-negative breast cancer. J Breast Cancer 26:268–285.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the present study, Chen et al. investigate the role of Endophilin A1 in regulating GABAergic synapse formation and function. To this end, the authors use constitutive or conditional knockout of Endophilin A1 (EEN1) to assess the consequences on GABAergic synapse composition and function, as well as the outcome for PTZ-induced seizure susceptibility. The authors show that EEN1 KO mice show a higher susceptibility to PTZ-induced seizures, accompanied by a reduction in the GABAergic synaptic scaffolding protein gephyrin as well as specific GABAAR subunits and eIPSCs. The authors then investigate the underlying mechanisms, demonstrating that Endophilin A1 binds directly to gephyrin and GABAAR subunits, and identifying the subdomains of Endophilin A1 that contribute to this effect. Overall, the authors state that their study places Endophilin A1 as a new regulator of GABAergic synapse function.

      Strengths:

      Overall, the topic of this manuscript is very timely, since there has been substantial recent interest in describing the mechanisms governing inhibitory synaptic transmission at GABAergic synapses. The study will therefore be of interest to a wide audience of neuroscientists studying synaptic transmission and its role in disease. The manuscript is well-written and contains a substantial quantity of data.

      Weaknesses:

      A number of questions remain to be answered in order to be able to fully evaluate the quality and conclusions of the study. In particular, a key concern throughout the manuscript regards the way that the number of samples for statistical analysis is defined, which may affect the validity of the data analysed. Addressing this weakness will be essential to providing conclusive results that support the authors' claims.

      We would like to thank the reviewer for appreciation of the value of our study and careful critics to help us improve the manuscript. We will correct the way that the number of samples for statistical analysis is defined throughout the manuscript as suggested and update figures, figure legends, and Materials and Methods accordingly. For example, we will average the values for all dendritic segments from one neuron, so that each data point represents one neuron in the graphs.

      Reviewer #2 (Public review):

      Summary:

      The function of neural circuits relies heavily on the balance of excitatory and inhibitory inputs. Particularly, inhibitory inputs are understudied when compared to their excitatory counterparts due to the diversity of inhibitory neurons, their synaptic molecular heterogeneity, and their elusive signature. Thus, insights into these aspects of inhibitory inputs can inform us largely on the functions of neural circuits and the brain.

      Endophilin A1, an endocytic protein heavily expressed in neurons, has been implicated in numerous pre- and postsynaptic functions, however largely at excitatory synapses. Thus, whether this crucial protein plays any role in inhibitory synapse, and whether this regulates functions at the synaptic, circuit, or brain level remains to be determined.

      New Findings:

      (1) Endophilin A1 interacts with the postsynaptic scaffolding protein gephyrin at inhibitory postsynaptic densities within excitatory neurons.

      (2) Endophilin A1 promotes the organization of the inhibitory postsynaptic density and the subsequent recruitment/stabilization of GABA A receptors via Endophilin A1's membrane binding and actin polymerization activities.

      (3) Loss of Endophilin A1 in CA1 mouse hippocampal pyramidal neurons weakens inhibitory input and leads to susceptibility to epilepsy.

      (4) Thus the authors propose that via its role as a component of the inhibitory postsynaptic density within excitatory neurons, Endophilin A1 supports the organization, stability, and efficacy of inhibitory input to maintain the excitatory/inhibitory balance critical for brain function.

      (5) The conclusion of the manuscript is well supported by the data but will be strengthened by addressing our list of concerns and experiment suggestions.

      We would like to thank the reviewer for their favorable impression of manuscript. We also appreciate the great experiment suggestions to help us improve the manuscript.

      Weaknesses:

      Technical concerns:

      (1) Figure 1F and Figure 1H, Figures 7H,J:

      Can the authors justify using a paired-pulse interval of 50 ms for eEPSCs and an interval of 200 ms for eIPSCs? Otherwise, experiments should be repeated using the same paired pulse interval.

      We apologize for the confusion. As illustrated by the schematic current traces, the decay time constants of eEPSCs and eIPSCs in hippocampal CA1 neurons are different. The eEPSCs exhibit a faster channel closing rate, corresponding to a smaller time constant Tau. Thus, a shorter inter-stimulus interval (50 ms) was chosen for paired-pulse ratio recordings. In contrast, the eIPSCs display a slower channel closing rate, with a Tau value larger than that of eEPSCs, so a longer inter-stimulus interval (200 ms) was used for PPR. This protocol has been long-established and adopted in previous studies (please see below for examples).

      Contractor, A., Swanson, G. & Heinemann, S. F. Kainate receptors are involved in short- and long-term plasticity at mossy fiber synapses in the hippocampus. Neuron 29, 209-216, doi:10.1016/s0896-6273(01)00191-x (2001).

      Babiec, W. E., Jami, S. A., Guglietta, R., Chen, P. B. & O'Dell, T. J. Differential Regulation of NMDA Receptor-Mediated Transmission by SK Channels Underlies Dorsal-Ventral Differences in Dynamics of Schaffer Collateral Synaptic Function. Journal of neuroscience 37, 1950-1964, doi:10.1523/JNEUROSCI.3196-16.2017 (2017).

      (2) Figures 3G,H,I:

      While 3D representations of proteins of interest bolster claims made by superresolution microscopy, SIM resolution is unreliable when deciphering the localization of proteins at the subsynaptic level given the small size of these structures (<1 micrometer). In order to determine the actual location of Endophilin A1, especially given the known presynaptic localization of this protein, the authors should complete SIM experiments with a presynaptic marker, perhaps an active zone protein, so that the relative localization of Endophilin A1 can be gleaned. Currently, overlapping signals could stem from the presynapse given the poor resolution of SIM in this context.

      Thanks for your suggestions. It is certainly preferable to investigate the relative localization of endophilin A1 using both presynaptic and postsynaptic markers. For SIM imaging in Figure 3G-I, to visualize neuronal morphology, we immunostained GFP as cell fill, leaving two other channels for detection of immunofluorescent signals of endophilin A1 and another protein. We will try co-immunostaining of endophilin A1, the active zone protein bassoon (presynaptic marker) and gephyrin without morphology labeling. Alternatively, we will do co-staining of endophilin A1 and bassoon in GFP-expressing neurons. We agree that overlapping signals or proximal localization of presynaptic endophilin A1 with gephyrin or GABAAR γ2 could not be ruled out. To note, if image resolution is improved with the use of a more advanced imaging system, the overlap between two proteins will become smaller or even disappear. With the ~110 nm lateral resolution of SIM microscopy, the degree of overlap between the two proteins of interest is much lower than in confocal microscopy. Given the presynaptic localization of endophilin, most likely we will observe a small overlap (presynatpic) or proximal localization (postsynaptic) of endophilin A1 with bassoon. Nevertheless, we will complete the SIM experiments as suggested to improve the manuscript.

      Manuscript consistency:

      (1) Figure 2:

      The authors looked at VGAT and noticed a reduction of signals in hippocampal regions in their P21 slices, indicating that the proposed postsynaptic organization/stabilization functions of Endophilin A1 extend to the inhibitory presynapse, perhaps via Neuroligin 2-Neurexin. Simultaneously, hippocampal regions in P21 slices showed a reduction in PSD-95 signals, indicating that excitatory synapses are also affected. It would be crucial to also look at excitatory presynapses, via VGLUT staining, to assess whether EndoA1 -/- also affects presynapses. Given the extensive roles of Endophilin A1 in presynapses, especially in excitatory presynapses, this should be investigated.

      Thanks for the thoughtful comments. Given that the both VGAT and PSD95 signals are reduced in hippocampal regions in P21 slices, it is conceivable that the proposed postsynaptic organization/stabilization functions of endophilin A1 extend to the inhibitory presynapse via Neuroligin-2-Neurexin and the excitatory presynapse as well during development. Of note, endophilin A1 knockout did not impair the distribution of Neuroligin-2 in inhibitory postsynapses (immunoisolated with anti-GABAAR α1) in mature mice (Figure 3K), and endophilin A1 did not bind to Neuroligin-2 (Figure 4D), suggesting that endophilin A1 might function via other mechanisms. Nevertheless, as functions of endophilin A family members at the presynaptic site are well-established, the reduction of presynaptic signals in developmental hippocampal regions of EndoA-/- mice might result from the depletion of presynaptic endophilin A1. The presynaptic deficits can be compensatory by other mechanisms as neurons mature. Certainly, we will do VGLUT staining of EndoA1-/- brain slices as suggested to assess the role of endophilin A1 in excitatory presynapses in vivo.

      (2) Figure 7C:

      The authors do not assess whether p140Cap overexpression rescues GABAAR receptor loss exhibited in Endophilin A1 KO, as they did for Gephryin. This would be an important data point to show, as p140Cap may somehow rescue receptor loss by another pathway. In fact, it is mentioned in the text that this experiment was done, "Consistently, neither p140Cap nor the endophilin A1 loss-of-function mutants could rescue the GABAAR clustering phenotype in EEN1 KO neurons (Figure 7C, D)" yet the data for p140Cap overexpression seem to be missing. This should be remedied.

      Thanks a lot for the thoughtful comment. We will determine whether p140Cap overexpression also rescues the GABAAR clustering phenotype in EndoA1-/- neurons by surface GABAAR γ2 staining in our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Chen et al. identify endophilin A1 as a novel component of the inhibitory postsynaptic scaffold. Their data show impaired evoked inhibitory synaptic transmission in CA1 neurons of mice lacking endophilin A1, and an increased susceptibility to seizures. Endophilin can interact with the postsynaptic scaffold protein gephyrin and promote assembly of the inhibitory postsynaptic element. Endophilin A1 is known to play a role in presynaptic terminals and in dendritic spines, but a role for endophilin A1 at inhibitory postsynaptic densities has not yet been described.

      Strengths:

      The authors used a broad array of experimental approaches to investigate this, including tests of seizure susceptibility, electrophysiology, biochemistry, neuronal culture, and image analysis.

      Weaknesses:

      Many results are difficult to interpret, and the data quality is not always convincing, unfortunately. The basic premise of the study, that gephyrin and endophilin A1 interact, requires a more robust analysis to be convincing.

      We greatly appreciate the positive comment on our study and the very valuable feedback for us to improve the manuscript. We will conduct additional experiments to improve our data quality and strengthen our evidences according to these great constructive suggestions. To gain strong evidence for the interaction between endophilin A1 and gephyrin, we will perform in vitro pull-down assay with recombinant proteins from bacterial expression system.

    1. Author response:

      Public Reviews:

      Summary:

      We sincerely thank the reviewers for their insightful and thorough feedback. Their comments cover both technical and conceptual aspects of our project, which we have attempted to address in our provisional responses.

      First, we would like to clarify that any current lack of documentation or technical issues (such as local installation challenges) reflect the software's early stage. These aspects are receiving our full attention and are not intended to remain in their current state. As suggested, we plan to enhance the toolbox’s structure by separating it into a standalone library and a web application, alongside developing smaller satellite apps for SWC and MOD file management. We will also expand our documentation, provide a more detailed user guide, and add video tutorials for the GUI.

      Second, we have clarified the rationale behind specific implementation choices in our software, explaining why certain features of the toolbox were designed and implemented in particular ways. Our goal is to maintain a strong focus on single-cell level modeling, addressing its various aspects in great detail. We are also working on new features, such as automated parameter optimization and support for multiple output formats, to further enrich the toolbox’s functionality.

      Reviewer #1 (Public review):

      Summary:

      Dendrotweaks provides its users with a solid tool to implement, visualize, tune, validate, understand, and reduce single-neuron models that incorporate complex dendritic arbors with differential distribution of biophysical mechanisms. The visualization of dendritic segments and biophysical mechanisms therein provide users with an intuitive way to understand and appreciate dendritic physiology.

      Strengths:

      (1) The visualization tools are simplified, elegant, and intuitive.

      (2) The ability to build single-neuron models using simple and intuitive interfaces.

      (3) The ability to validate models with different measurements.

      (4) The ability to systematically and progressively reduce morphologically-realistic neuronal models.

      We thank the reviewer for their positive comments.

      Weaknesses:

      (1) Inability to account for neuron-to-neuron variability in structural, biophysical, and physiological properties in the model-building and validation processes.

      We agree with the reviewer that it is important to account for neuron-to-neuron variability. The core approach of DendroTweaks and its distinctive feature is interactive exploration of how morpho-electric parameters affect neuronal activity. In light of this, variability can be achieved through interactive updating of the model parameters with widgets. In a sense, by adjusting a widget (e.g., channel distribution or kinetics), a user ends up with a new instance of a cell in the parameter space and receives almost real-time feedback on how this change affects neuronal activity. Implementing complex algorithms to account for neuron-to-neuron variability during the validation process would detract from the interactivity aspect of the GUI. That being said, we acknowledge the importance of this issue and we will explore the options to address it more comprehensively in our revised manuscript.

      (2) Inability to account for the many-to-many mapping between ion channels and physiological outcomes. Reliance on hand-tuning provides a single biased model that does not respect pronounced neuron-to-neuron variability observed in electrophysiological measurements.

      We acknowledge the challenge of accounting for degeneracy in the relation between ion channels and physiological outcomes and the importance of capturing neuron-to-neuron variability. One possible way to address this, as we mention in the Discussion, is to integrate automated parameter optimization algorithms alongside the existing interactive hand-tuning with widgets. We are currently exploring the possibility of integrating Jaxley (Deistler et al., 2024) into DendroTweaks in addition to NEURON. This would allow for automated and fast gradient-based parameter optimization, including optimization of heterogeneous channel distributions.

      (3) Lack of a demonstration on how to connect reduced models into a network within the toolbox.

      Building a network of reduced models is a promising direction, albeit it goes beyond the scope of this manuscript. We do not plan to add support for network models to the toolbox itself. In DendroTweaks, we focus on single-cell modeling, aiming to cover its various aspects in great detail. Of course, such refined single-cell models—both detailed and reduced—are likely to be integrated into networks but this will not take place within the DendroTweaks toolbox. To support the integration of DendroTweaks-produced model neurons into networks, we will focus on better compatibility with existing formats and standards and improve exporting capabilities. It is already possible to export reduced morphologies as SWC files, standardized ion channel models as MOD files and channel distributions as JSON files. Nevertheless, as a proof of concept, we plan to generate a simple network of exported reduced models outside the toolbox and include it as a separate Jupyter notebook.

      (4) Lack of a set of tutorials, which is common across many "Tools and Resources" papers, that would be helpful in users getting acquainted with the toolbox.

      This is a valid concern that we aim to address promptly. Currently, an online user guide is available at https://dendrotweaks.dendrites.gr/guide.html. This guide introduces users to the GUI elements and covers basic use cases. We are working on video tutorials and detailed documentation, which will be available soon (as part of the revised manuscript). The toolbox will be split into two parts: a Bokeh app and a standalone library. The library will offer the core functionality, such as reducing morphology and standardizing channels, without the GUI, enabling bulk processing. It will be installable through PyPI and integrated into the app code as an external library. We will provide thorough documentation for all classes and functions in the library.

      Reviewer #2 (Public review):

      The paper by Makarov et al. describes the software tool called DendroTweaks, intended for the examination of multi-compartmental biophysically detailed neuron models. It offers extensive capabilities for working with very complex distributed biophysical neuronal models and should be a useful addition to the growing ecosystem of tools for neuronal modeling.

      Strengths

      (1) This Python-based tool allows for visualization of a neuronal model's compartments.

      (2) The tool works with morphology reconstructions in the widely used .swc and .asc formats.

      (3) It can support many neuronal models using the NMODL language, which is widely used for neuronal modeling.

      (4) It permits one to plot the properties of linear and non-linear conductances in every compartment of a neuronal model, facilitating examination of the model's details.

      (5) DendroTweaks supports manipulation of the model parameters and morphological details, which is important for the exploration of the relations of the model composition and parameters with its electrophysiological activity.

      (6) The paper is very well written - everything is clear, and the capabilities of the tool are described and illustrated with great attention to detail.

      We thank the reviewer for their positive comments.

      Weaknesses

      (1) Not a really big weakness, but it would be really helpful if the authors showed how the performance of their tool scales. This can be done for an increasing number of compartments - how long does it take to carry out typical procedures in DendroTweaks, on a given hardware, for a cell model with 100 compartments, 200, 300, and so on? This information will be quite useful to understand the applicability of the software.

      DendroTweaks functions as a layer on top of a simulation engine. As a result, currently its performance scales in proportion to the NEURON’s one. Note that the GUI displays the time taken to run a given simulation in NEURON at the bottom of the Simulation tab in the left menu. While GUI-related processing and rendering also consume time, this is not as straightforward to measure. Nonetheless, we will explore options to provide suggested benchmarking in the revised manuscript.

      (2) Let me also add here a few suggestions (not weaknesses, but something that can be useful, and if the authors can easily add some of these for publication, that would strongly increase the value of the paper).

      (3) It would be very helpful to add functionality to read major formats in the field, such as NeuroML and SONATA.

      We agree with the reviewer that support for major formats will substantially improve and ensure reproducibility and reusability of the models. As mentioned in the Discussion, we plan to add support for NeuroML. Regarding SONATA, it is indeed possible to view our models as a network with a single morphologically-detailed biophysical node receiving inputs from multiple populations of virtual nodes. In future editions of the tool we plan to expand its support for additional file formats.

      (4) Visualization is available as a static 2D projection of the cell's morphology. It would be nice to implement 3D interactive visualization.

      We offer an option to rotate a cell around the vertical axis using a slider under the plot. This is a workaround, as implementing a true 3D visualization in Bokeh would require custom Bokeh elements, along with external JavaScript libraries. Despite these implementation difficulties, we advocate for a different approach than the one used in most of the morphology viewers mentioned in the Discussion. The core idea of DendroTweaks' morphology exploration is that each section is "clickable" allowing its geometric properties to be examined in a 2D Section view. Furthermore, we believe the Graph view presents the overall cell topology more clearly than a 3D visualization.

      (5) It is nice that DendroTweaks can modify the models, such as revising the radii of the morphological segments or ionic conductances. It would be really useful then to have the functionality for writing the resulting models into files for subsequent reuse.

      This functionality is already available. Users can export JSON files with channel distributions and SWC files after morphology reduction through the GUI. In the standalone version, users can modify and export SWC files, as well as export MOD files after standardization. Please note that in the online demo version export and import functionality is currently limited, but we plan to fully enable it when submitting our revisions. We are considering separating file managers as satellite apps—one for SWC and one for MOD files. It is worth mentioning that the MOD file manager along with parsing the files and generating Python classes for visualization purposes is already capable of producing Jaxley-compatible Python channel classes.

      (6) If I didn't miss something, it seems that DendroTweaks supports the allocation of groups of synapses, where all synapses in a group receive the same type of Poisson spike train. It would be very useful to provide more flexibility. One option is to leverage the SONATA format, which has ample functionality for specifying such diverse inputs.

      Currently, each group shares the same set of parameters for both biophysical properties of synapses (e.g., reversal potential, time constants) and presynaptic "population" activity (e.g., rate, onset). The parameter that controls an incoming Poisson spike train is the rate, which is indeed shared across all synapses in a group. The suggestion to allow for variability in input properties within a group is interesting and is worth implementing. We will explore this in the revised manuscript.

      (7) "Each session can be saved as a .json file and reuploaded when needed" - do these files contain the whole history of the session or the exact snapshot of what is visualized when the file is saved? If the latter, which variables are saved, and which are not? Please clarify.

      These files capture the exact snapshot of the model's latest state. They include model parameters such as channel distributions, equilibrium potentials, and temperature. Currently, stimuli (current clamps and synapses) are not saved. However, we plan to add an option to export stimuli parameters in the same JSON file. This will also be available as part of the revised manuscript.

      References

      Michael Deistler, Kyra L. Kadhim, Matthijs Pals, Jonas Beck, Ziwei Huang, Manuel Gloeckler, Janne K. Lappalainen, Cornelius Schröder, Philipp Berens, Pedro J. Gonçalves, Jakob H. Macke Differentiable simulation enables large-scale training of detailed biophysical models of neural dynamics bioRxiv 2024.08.21.608979; doi:https://doi.org/10.1101/2024.08.21.608979

    1. Author response:

      To reviewer #1:

      We appreciate your advice on providing more conceptual motivations for comparing Bayesian and RL-like belief updating models. In short, both model families are complementary in capturing asymmetrical and symmetrical updating. They both consider that the magnitude of updating is weighed by two separate learning rates, one for positive and one for negative belief disconfirming evidence. If these two learning rates differ, updating is asymmetrical; if they are equal, updating is symmetrical.

      However, the model families’ assumptions about the underlying updating process differ. In the RL-like belief updating model family, this process is assumed to be driven by comparing base rates and initial beliefs, also known as the prediction error (PE), weighed by the learning rates. On the contrary, the Bayesian updating model assumes that updating (i.e., the posterior belief) is driven by combining the base rate (i.e., the prior evidence) and how often the initial belief is represented in the estimated base rate (i.e., the likelihood ratio of all other alternative hypotheses, beliefs). Moreover, the two components of the posterior belief can differ in their respective contribution (i.e., precision or confidence), which might be more adaptive to external actual life conditions characterized by high uncertainty about the future.

      For the revised manuscript, we will elaborate more on the conceptual and psychological meaning of these two proposed belief updating processes. So far, it is important to note that we do not have direct proof of humans reasoning in an RL-like or Bayesian way when updating their beliefs about the future. We, therefore, focus on the complementarity of both models to capture latent processes and variables in belief updating that can be leveraged to understand the sources of inter-individual differences and the impact of external contexts such as experiencing an actual adverse life event on human psychology.

      To reviewer #2:

      Thank you for recommending the exploration of potential differences between optimism biases in initial belief estimations (self versus other) during and outside the pandemic. We will also provide more details on the belief updating task and design.

      To both reviewers: 

      We agree on the limitations arising from the lack of physiological and self-reported measures of stress. We collected some self-reports on risk perception, adoption of protective measures, need for social interactions, and mood, but solely in participants tested during the pandemic-related lockdowns (reported in the SI Table 1). For the revised manuscript, we propose exploring the correlational links between belief-updating biases and self-reports in this sample. The expected outcomes of such correlational analyses may identify the variables to target with interventions in future studies of human belief updating under real-world contexts. We also will add a relevant section to the discussion to elaborate on the limitation that hinders inferring plausible psychological causes of the differences observed in belief updating during and outside the pandemic.

      Importantly, we will follow your recommendations to improve the computational modeling analyses. We will (1) add the confusion matrices from model recovery analyses to gain inferences on specificity, (2) provide evidence for the best-fitting model to reproduce the observed behavior shown in Figure 1, and (3) conduct model comparisons on the combined groups to justify the focus on the RL like updating model. In a few weeks, we plan to submit a revised manuscript alongside a point-by-point response to your concerns and recommendations.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      First, the authors confirm the up-regulation of the main genes involved in the three branches of the Unfolded Protein Response (UPR) system in diet-induced obese mice in AT, observations that have been extensively reported before. Not surprisingly, IRE1a inhibition with STF led to an amelioration of the obesity and insulin resistance of the animals. Moreover, non-alcoholic fatty liver disease was also improved by the treatment. More novel are their results in terms of thermogenesis and energy expenditure, where IRE1a seems to act via activation of brown AT. Finally, mice treated with STF exhibited significantly fewer metabolically active and M1-like macrophages in the AT compared to those under vehicle conditions. Overall, the authors conclude that targeting IRE1a has therapeutical potential for treating obesity and insulin resistance.

      The study has some strengths, such as the detailed characterization of the effect of STF in different fat depots and a thorough analysis of macrophage populations. However, the lack of novelty in the findings somewhat limits the study´s impact on the field.

      We thank the reviewer for the appreciation of our findings and the comments about the novelty. Regarding the novelty, we would emphasize several novelties presented in this manuscript. First, as the reviewer correctly pointed out, we discovered that IRE1 inhibition by STF activates brown AT and promotes thermogenesis and that IRE1 inhibition not only significantly attenuated the newly discovered CD9+ ATMs and the “M1-like” CD11c+ ATMs but also diminished the M2 ATMs for the first time. These discoveries are very important and novel. In obesity, it was originally proposed that ATM undergoes M1/M2 polarization from an anti-inflammatory M2 to a classical pro-inflammatory M1 state. It was further reported that IRE1 deletion improves thermogenesis by boosting M2 population which then synthesize and secrete catecholamines to promote thermogenesis. It is now known that M2 macrophages do not synthesize catecholamines or promote thermogenesis. In this study, we discovered that IRE1 inhibition doesn’t increase (but instead decrease) the M2 population and that IRE1 inhibition promotes thermogenesis likely by suppressing pro-inflammatory macrophage populations including the M1-like ATMs and most importantly the newly identified metabolically active macrophages, given that ATM inflammation has been reported to suppress thermogenesis. Second, this study presented the first characterization of relationship between the more classical M1-like ATMs and the newly discovered metabolically active ATMs, showing that the CD11c+ M1-like ATMs are largely overlapping with but yet non-identical to CD9+ ATMs in the eWAT under HFD. Third, although upregulation of ER stress response genes in the adipose tissues of diet-induced obese mice have been extensively reported, it doesn’t necessarily mean that targeting IRE1a or ER stress can reverse existing insulin resistance and obesity. It is not uncommon that a therapy doesn’t yield the desired effect as expected. For instance, amyloid plaques are a hallmark of Alzheimer's disease (AD), interventions that prevent or reverse beta amyloid deposition have been expected to prevent progression or even reverse cognitive impairment in AD patients. However, clinical trials on such therapies have been disappointing. In essence, experimental demonstration of effectiveness or feasibility for any potential therapeutic targets is a first step for any future clinical implementation.

      Reviewer #2 (Public review):

      The manuscript by Wu et al demonstrated that IRE1a inhibition mitigated insulin resistance and other comorbidities through increased energy expenditure in DIO mice. In this reviewer's opinion, this timely study has high significance in the field of metabolism research for the following reasons.

      (1) The authors' findings are significant and may offer a new therapeutic target to treat metabolic diseases, including diabetes, obesity, NAFLD, etc.

      (2) The authors carefully profiled the ATMs and examined the changes in gene expression after STF treatment.

      (3) The authors presented evidence collected from both systemic indirect calorimetry and individual tissue gene expression to support the notion of increased energy expenditure.

      Overall, the authors have presented sufficient background in a clear and logically organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      We thank the reviewer for the appreciation of our work.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Wu D. et al. explores an innovative approach to immunometabolism and obesity by investigating the potential of targeting macrophage Inositol-requiring enzyme 1α (IRE1α) in cases of overnutrition. Their findings suggest that pharmacological inhibition of IRE1α could influence key aspects such as adipose tissue inflammation, insulin resistance, and thermogenesis. Notable discoveries include the identification of High-Fat Diet (HFD)-induced CD9+ Trem2+ macrophages and the reversal of metabolically active macrophages' activity with IRE1α inhibition using STF. These insights could significantly impact future obesity treatments.

      Strengths:

      The study's key strengths lie in its identification of specific macrophage subsets and the demonstration that inhibiting IRE1α can reverse the activity of these macrophages. This provides a potential new avenue for developing obesity treatments and contributes valuable knowledge to the field.

      Weaknesses:

      The research lacks an in-depth exploration of the broader metabolic mechanisms involved in controlling diet-induced obesity (DIO). Addressing this gap would strengthen the understanding of how targeting IRE1α might fit into the larger metabolic landscape.

      Impact and Utility:

      The findings have the potential to advance the field of obesity treatment by offering a novel target for intervention. However, further research is needed to fully elucidate the metabolic pathways involved and to confirm the long-term efficacy and safety of this approach. The methods and data presented are useful, but additional context and exploration are required for broader application and understanding.

      We thank the reviewer for the appreciation of strengths in our manuscript. In particular, we appreciate the reviewer’s recommendation on the exploration of broader metabolic landscape, such as the effect of IRE1 inhibition on non-adipose tissue macrophages and metabolism. We agree that achieving these will certainly broaden the therapeutic potential of IRE1 inhibition to larger metabolic disorders and we will pursue these explorations in future studies.

    1. Author response:

      We thank the reviewers for their constructive feedback here, which will both improve the present manuscript, and help us update our approach as we continue to examine interregional interactions in the motor system. Below we address the concerns raised in the Public Reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a

      specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons that are largely silenced, and the endogenous activity that is affected in a downstream region. To clarify, the effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that connect RFA and CFA. 

      Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that the silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns.

      This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depends on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortical regions at a particular point during motor behavior. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences vis-à-vis movement execution, as disturbance to processes on which execution depends can impede execution itself. 

      That said, we would agree that the form of the causal interaction between RFA and CFA remains largely unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as transcription factor gene knockouts do not expose how the effect on transcription occurs. To show evidence for specific interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3 in this case, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment, including those in rodents.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). Thus there is not much room for the effects on projection neurons in RFA to be much larger. As part of other work currently in review, we have verified that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of past lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in premotor and primary motor cortex have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example Bachschmid-Romano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach – a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between connected regions will be strongest. Similarity in alignment across lags might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishikawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach that mirrors those applied in much of the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. 

      It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation related to the differences in what is calculated in DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single-trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may be involved in distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could be illuminating. Since it remains a challenge to rigorously identify a subset of activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the motor cortical activity that decoders use for predicting muscle activity matches the motor cortical activity that actually drives muscle activity in situ. 

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other,

      (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and

      (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry matching that of functional influence in cases where overall activity exhibited symmetry, our results imply that the activity not related to movement would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses used over 90% of the total activity variance, suggesting that both condition-dependent and condition-invariant components are included.

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used – to capture experimental results and generate hypotheses about potential explanations. We feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study requiring numerous controls – a whole other paper in itself.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study investigates how ant group demographics influence nest structures and group behaviors of Camponotus fellah ants, a ground-dwelling carpenter ant species (found locally in Israel) that build subterranean nest structures. Using a quasi-2D cell filled with artificial sand, the authors perform two complementary sets of experiments to try to link group behavior and nest structure: first, the authors place a mated queen and several pupae into their cell and observe the structures that emerge both before and after the pupae eclose (i.e., "colony maturation" experiments); second, the authors create small groups (of 5,10, or 15 ants, each including a queen) within a narrow age range (i.e., "fixed demographic" experiments) to explore the dependence of age on construction. Some of the fixed demographic instantiations included a manually induced catastrophic collapse event; the authors then compared emergency repair behavior to natural nest creation. Finally, the authors introduce a modified logistic growth model to describe the time-dependent nest area. The modification introduces parameters that allow for age-dependent behavior, and the authors use their fixed demographic experiments to set these parameters, and then apply the model to interpret the behavior of the colony maturation experiments. The main results of this paper are that for natural nest construction, nest areas, and morphologies depend on the age demographics of ants in the experiments: younger ants create larger nests and angled tunnels, while older ants tend to dig less and build predominantly vertical tunnels; in contrast, emergency response seems to elicit digging in ants of all ages to repair the nest.

      We sincerely thank Reviewer #1 for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we will incorporate them into the next version to improve the manuscript.

      Reviewer #2 (Public review):

      I enjoyed this paper and the approach to examining an accepted wisdom of ants determining overall density by employing age polyethism that would reduce the computational complexity required to match nest size with population (although I have some questions about the requirement that growth is infinite in such a solution). Moreover, the realization that models of collective behaviour may be inappropriate in many systems in which agents (or individuals) differ in the behavioural rules they employ, according to age, location, or information state. This is especially important in a system like social insects, typically held as a classic example of individual-as-subservient to whole, and therefore most likely to employ universal rules of behaviour. The current paper demonstrates a potentially continuous age-related change in target behaviour (excavation), and suggests an elegant and minimal solution to the requirement for building according to need in ants, avoiding the invocation of potentially complex cognitive mechanisms, or information states that all individuals must have access to in order to have an adaptive excavation output.

      We sincerely thank reviewer #2 for the time and effort dedicated to our manuscript's detailed review and assessment. The insightful feedback provided by the reviewer will be incorporated into the successive revisions.

      The only real reservation I have is in the question of how this relationship could hold in properly mature colonies in which there is (presumably) a balance between the birth and death of older workers. Would the prediction be that the young ants still dig, or would there be a cessation of digging by young ants because the area is already sufficient? Another way of asking this is to ask whether the innate amount of digging that young ants do is in any way affected by the overall spatial size of the colony. If it is, then we are back to a problem of perfect information - how do the young ants know how big the overall colony is? Perhaps using density as a proxy? Alternatively, if the young ants do not modify their digging, wouldn't the colony become continuously larger? As a non-expert in social insects, I may be misunderstanding and it may be already addressed in the citations used.

      We thank the reviewer for this interesting question. We find that the nest excavation is predominantly performed by the younger ants in the nest and the nest area increase is followed by an increase in the population. However, if the young ants dig unrestricted, this could result in unnecessary nest growth as suggested by reviewer #2. Therefore, we believe that the innate digging behavior of ants could potentially be regulated by various cues such as;

      (a) Density-based: If the colony becomes less dense as its area expands, this could serve as a feedback signal for young ants to reduce or stop digging, as described in references (25, 29, 30).

      (b) Pheromone depositions: If the colony reaches a certain population density, pheromone signals could inhibit further digging by young ants, references (25, 29,) or space usage as a proxy for the nest area.

      Thus, rather than perfect information, decentralized control, and digging-based local cues probably regulate the level of age-dependent digging, without the ants needing to estimate the overall colony size or nest area.

      In any case, this is an excellent paper. The modelling approach is excellent and compelling, also allowing extrapolation to other group sizes and even other species. This to me is the main strength of the paper, as the answer to the question of whether it is younger or older ants that primarily excavate nests could have been answered by an individual tracking approach (albeit there are practical limitations to this, especially in the observation nest setup, as the authors point out). The analysis of the tunnel structure is also an important piece of the puzzle, and I really like the overall study.

      We thank the reviewer for the comments. We completely agree that individual tracking of ants within our experimental setup would have been the ideal approach, but we were limited by technical and practical limitations of the setup as pointed out by the reviewer such as;

      (a) Continuous tracking of ants in our nests would have required a camera to be positioned at all times in front of the nest, which necessitates a light background. Since Camponotus fellah ants are subterranean, we aimed to allow them to perform nest excavation in conditions as close to their natural dark environment as possible. Additionally, implementing such a system in front of each nest would have reduced the sample sizes for our treatments.

      (b) The experimental duration of our colony maturation and fixed demographics experiments extended for up to six months (unprecedented durations in these kinds of measurements). These naturally limited our ability to conduct individual tracking while maintaining the identity of each ant based on the current design.

      Reviewer #3 (Public review):

      Summary:

      In this study, Harikrishnan Rajendran, Roi Weinberger, Ehud Fonio, and Ofer Feinerman measured the digging behaviours of queens and workers for the first 6 months of colony development, as well as groups of young or old ants. They also provide a quantitative model describing the digging behaviours and allowing predictions. They found that young ants dig more slanted tunnels, while older ants dig more vertically (straight down). This finding is important, as it describes a new form of age polyethism (a division of labour based on age). Age polyethism is described as a "yes or no" mechanism, where individuals perform or not a task according to their age (usually young individuals perform in-nest tasks, and older ones foraging). Here, the way of performing the task is modified, not only the propensity to carry it or not. This data therefore adds in an interesting way to the field of collective behaviours and division of labour.

      The conclusions of the paper are well supported by the data. Measurements of the same individuals over time would have strengthened the claims.

      We sincerely thank reviewer #3 for the time and effort dedicated to our manuscript's detailed review and assessment. We completely agree with the reviewer’s comments on the measurements of the same individuals over time, however, we were limited by the technical and experimental limitations as described above and pointed out by reviewer #2.

      Strengths:

      I find that the measure of behaviour through development is of great value, as those studies are usually done at a specific time point with mature colonies. The description of a behaviour that is modified with age is a notable finding in the world of social insects. The sample sizes are adequate and all the information clearly provided either in the methods or supplementary.

      We thank the reviewer #3 for this assessment.

      Weaknesses:

      I think the paper is failing to take into consideration or at least discuss the role of inter-individual variabilities. Tasks have been known to be undertaken by only a few hyper-active individuals for example. Comments on the choice to use averages and the potential roles of variations between individuals are in my opinion lacking. Throughout the paper wording should be modified to refer to the group and not the individuals, as it was the collective digging that was measured. Another issue I had was the use of "mature colony" for colonies with very few individuals and only 6 months of age. Comments on the low number of workers used compared to natural mature colonies would be welcome.

      Regarding main comment 1

      We completely agree with the reviewer’s comment on considering inter-individual variability based on activity levels. We have discussed how individual morphological variability could influence digging behavior (references: 28, 31), and we will elaborate further on this aspect in future revisions.

      Regarding main comment 2:

      We agree with the reviewer’s comments regarding the wording. The term “mature colony” will be revised in future versions. The wording (“mature colony”‘) will be changed and addressed in the future revisions. We were practically limited by the continuation of the experiments for more than 6 months of age predominantly due to the stability of nests as they were made with a sand-soil mix. We also acknowledge that the colony sizes attained in our maturation experiments may be smaller than those of naturally matured colonies. This trend was observed generally in lab-reared colonies and could be attributed to differences in microclimatic conditions, foraging opportunities, space availability, and other factors. We will address these aspects in more detail in future revisions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper describes the covalent interactions of small molecule inhibitors of carbonic anhydrase IX, utilizing a pre-cursor molecule capable of undergoing beta-elimination to form the vinyl sulfone and covalent warhead.

      Strengths:

      The use of a novel covalent pre-cursor molecule that undergoes beta-elimination to form the vinyl sulfone in situ. Sufficient structure-activity relationships across a number of leaving groups, as well as binding moieties that impact binding and dissociation constants.

      Overall, the paper is clearly written and provides sufficient data to support the hypothesis and observations. The findings and outcomes are significant for covalent drug discovery applications and could have long-term impacts on related covalent targeting approaches.

      Weaknesses:

      No major weaknesses were noted by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      The authors utilized a "ligand-first" targeted covalent inhibition approach to design potent inhibitors of carbonic anhydrase IX (CAIX) based on a known non-covalent primary sulfonamide scaffold. The novelty of their approach lies in their use of a protected pre(pro?)-vinylsulfone as a precursor to the common vinylsulfone covalent warhead to target a nonstandard His residue in the active site of CAIX. In addition to a biochemical assessment of their inhibitors, they showed that their compounds compete with a known probe on the surface of HeLa cells.

      Strengths:

      The authors use a protected warhead for what would typically be considered an "especially hot" or even "undevelopable" vinylsulfone electrophile. This would be the first report of doing so making it a novel targeted covalent inhibition approach specifically with vinylsulfones.

      The authors used a number of orthogonal biochemical and biophysical methods including intact MS, 2D NMR, x-ray crystallography, and an enzymatic stopped-flow setup to confirm the covalency of their compounds and even demonstrate that this novel pre-vinylsulfone is activated in the presence of CAIX. In addition, they included a number of compelling analogs of their inhibitors as negative controls that address hypotheses specific to the mechanism of activation and inhibition.

      The authors employed an assay that allows them to assess target engagement of their compounds with the target on the surface of cells and a fluorescent probe which is generally a critical tool to be used in tandem with phenotypic cellular assays.

      Weaknesses:

      While the authors show that the pre-vinyl moiety is shown biochemically to be transformed into the vinylsulfone, they do not show what the fate of this -SO2CH2CH2OCOR group is in a cellular context. Does the pre-vinylsulfone in fact need to be in the active site of CAIX on the surface of the cell to be activated or is the vinylsulfone revealed prior to target engagement?

      I appreciate the authors acknowledging the limitations of using an assay such as thermal shift to derive an apparent binding affinity, however, it is not entirely convincing and leaves a gap in our understanding of what is happening biochemically with these inhibitors, especially given the two-step inhibitory mechanism. It is very difficult to properly understand the activity of these inhibitors without a more comprehensive evaluation of kinact and Ki parameters. This can then bring into question how selective these compounds actually are for CAIX over other carbonic anhydrases.

      The authors did not provide any cellular data beyond target engagement with a previously characterized competitive fluorescent probe. It would be critical to know the cytotoxicity profile of these compounds or even how they affect the biology of interest regarding CAIX activity if the intention is to use these compounds in the future as chemical probes to assess CAIX activity in the context of tumor metastasis.

      Reviewer #3 (Public review):

      Summary:

      Targeted covalent inhibition of therapeutically relevant proteins is an attractive approach in drug development. This manuscript now reports a series of covalent inhibitors for human carbonic anhydrase (CA) isozymes (CAI, CAII, and CAIX, CAXIII) for irreversible binding to a critical histidine amino acid in the active site pocket. To support their findings, they included co-crystal structures of CAI, CAII, and CAIX in the presence of three such inhibitors. Mass spectrometry and enzymatic recovery assays validate these findings, and the results and cellular activity data are convincing.

      Strengths:

      The authors designed a series of covalent inhibitors and carefully selected non-covalent counterparts to make their findings about the selectivity of covalent inhibitors for CA isozymes quite convincing. The supportive X-ray crystallography and MS data are significant strengths. Their approach of targeted binding of the covalent inhibitors to histidine in CA isozyme may have broad utility for developing covalent inhibitors.

      Weaknesses:

      This reviewer did not find any significant weaknesses. However, I suggest several points in the recommendation for the authors' section for authors to consider.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have made excellent suggestions. We believe a revised version addressing those points can improve the assessment and quality of your work.

      Reviewer #1 (Recommendations for the authors):

      (1) The beta-elimination process is referred to as a "rearrangement" in both the text and the Figure 2 legend. Based on the proposed mechanism the authors provided, it is a simple beta-elimination and conjugate addition mechanism, and is not a rearrangement mechanism. This change should be reflected in the text and Figure 2 legend.

      We have made the requested change from rearrangement to elimination reaction.

      (2) From a structure-based design perspective, it is not obvious why only large cyclo-alkyl groups were used to target the lipophilic pocket, with the exception of the phenyl carbamates. Perhaps this is background literature on CAIX that describes this? It seems like this is a flexible functional moiety that could be used to impact drug properties. Why were other lipophilic and especially more aromatic or heteroaromatic moieties not studied?

      The structure-affinity relationship of the lipophilic ring versus other moieties has been studied and reported previously in manuscripts: Dudutiene 2014, Zubriene 2017, Linkuviene 2018, chapter 16 by Zubriene (https://doi.org/10.1007/978-3-030-12780-0_16). The lipophilic ring served better than a flexible tail or an aromatic ring.

      (3) The color-coded "correlation map" in Figure 8 is difficult to follow. Perhaps a standard SAR table with selectivity and affinity values would be easier to read and follow.

      We are trying to promote “correlation maps” because in our opinion they are easier to follow than tables.

      (4) Although there is a statement for this in line 254 of the SI, the compound numbering in the SI, vs. the numbering used in the manuscript is confusing. The standard format for these is to consecutively number all compounds and have identical compound numbers in both the SI and manuscript. The synthetic intermediates included in the SI can be identified by IUPAC names.

      An additional numbering system had to be made because the synthesis was described in the supplementary materials. We would prefer to leave the numbering as in the current manuscript. There are quite a few intermediate compounds that we assigned intermediate numbers such as 20x in order to make it simpler to distinguish intermediate synthesis compounds from compounds that were studied for binding affinity.

      (5) Ranges of isolated yields for the synthetic steps in SI schemes SI, S2, and S3 need to be included.

      We have remade the SI schemes S1, S2, and S3 to include the yields of each compound.

      (6) Presumably, the AcOH/H2O2 reaction forms the sulfones and not sulfoxides when heat is used. In the SI, the structures of 9x and 10x are shown to be sulfoxides and not sulfones. Initially, this is thought to be a simple structural mistake, however, this is concerning, since the HRMS data (for compound 9x) reported is for the sulfoxide (HRMS for C8H7F4NO4S2 [(M+H)+]: calc. 321.9825, found 321.9824. 482) and not the sulfone? In the synthesis scheme S1, condition "C" is used for both the sulfoxide and sulfone synthesis (i.e. 3ax to 9x vs. 12x to 13x). It appears the sulfoxide is prepared using a room temperature procedure, vs. the sulfone requiring 75 degrees centigrade heat. These two similar conditions need to be designated as different synthetic steps in the schemes with the specific conditions noted since the products formed are different.

      We have made requested corrections/adjustments and added separate reaction conditions for sulfoxide synthesis in SI scheme S1.

      Reviewer #2 (Recommendations for the authors):

      I appreciate that it's difficult to determine parameters such as kinact or Ki of such potent inhibitors and ones that work by a two-step mechanism. I might suggest characterizing the steps separately to determine the detailed parameters. Maybe something like NMR for the for the activation step and SPR for the kinact and Ki of the unmasked vinylsulfone?

      We agree that such information would be helpful. However, it requires significant effort and equipment and will be performed in a separate study.

      I always advocate for at least a global proteomics analysis using a pulldown probe to get an idea of the specificity profile, especially for the so-far untried and untested pre-vinylsulfone moiety.

      We fully agree that the pull-down assay is a good idea. However, this major task will be performed in a separate study.

      This might be picky but wouldn't this be considered a pro-vinylsulfone rather than pre-vinylsulfone? Just as the term "prodrug" is used?

      We agree that both the pre-vinylsulfone and pro-vinylsulfone are suitable names. However, in pharmacology, the prodrug is common, but in organic synthesis, the precursor is commonly used. Therefore, we prefer to keep the pre-vinylsulfone.

      I would also be curious to know what species is responsible for activating the compound to the vinylsulfone. Maybe make some key point mutations of nearby basic residues?

      The His64 formed the covalent bond, thus His64 was the likely activating base. Preparing a mutation could be a good path for future studies.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors presented only a close-up view of the active site with a 2Fo-Fc map mesh in three panels of Figure 4. For readers unfamiliar with the carbonic anhydrase field, adding a complete illustration of each protein-inhibitor complex (protein in cartoon mode and ligand in stick) will be helpful. Also, an image of the 180º rotation of the close-up view presented in each panel should be added. Depicting h-bonds between critical residues (Asn62, Gln 92, etc.) with dashed lines and marking the distances will be helpful for readers.

      We have prepared a requested picture for CAIX. Panels on the left show entire protein molecule view of the bound ligands to each isozyme and there are two close-up views for each structure rotated 180 degrees.

      (2) Line 198 should be revised to refer to the correct complexes. 20, 21, and 23 should be 21, 20, 23.

      We appreciate that the reviewer noticed this error. We corrected the mistake.

      (3) Omit electron density maps around each ligand in Figure 4 should be included for compounds 20, 21, and 23, perhaps as a supplementary figure.

      Detailed electron density map information is provided in the mtz files that have been submitted to the PDB. We think the omit maps are not necessary in the supplementary materials.

      (4) The cyclooctyl group is stabilized by hydrophobic active site residues, L131, A135, L141, and L198. However, only L131 is shown in Figure 4. All residues that stabilize the ligands should be shown.

      For clarity purposes of the figure, we have omitted some of the residues that make contact with the ligand molecule. We think that the structure provided to the PDB could be analyzed in detail to see all contacts between the ligand and protein molecule.

      (5) The supplementary table S1 lacks the crystallographic data on the CAIX-23 complex.

      We have added a new version of the supplementary materials that contains the crystallographic data on the CAIX-23 complex.

      (6) A minor peak (30213 Da) with a 638 Dalton shift compared to the unmodified enzyme is for Figure 5A, not Figure 5B, as mentioned in line 235. This sentence in line 235 should be corrected.

      We corrected this mistake.

      (7) As the authors stated in the text, a minor peak (30213 Da) represents a potential second binding site. Can they revisit their electron density maps and show any residual density if it is present around a second histidine residue? The MS data in Figure S17C indicates the presence of additional sites for compound 12. Thus, additional electron density around the secondary and tertiary sites is possible.

      CAII contains His3 and His4 that are at the N-end of the protein and not visible in the crystal structure. The NMR data indicate that the additional modification may occur at one of these His residues.

      (8) MS data were presented for compounds 12 and 22 in Figure 5A, B, but the co-crystal structures were generated with compounds 21, 20, and 23. Why was no MS data included for compounds 20, 21, and 23? Would these compounds show the presence of a secondary binding site? Can authors include the MS data?

      In the main body of the manuscript in Figure 5A we only present MS data on CAXIII with compound 12. It is only an example that confirms covalent interaction. In the supplementary we have MS data for compound 12 with all carbonic anhydrase isozymes and compound 20 with almost all (except CAVI) CA isozymes. There are also MS data provided with numerous compounds (3, 9, 13, and other) and CA isozymes that serve as a control or confirmation of covalent bond formation.

      (9) The coordination between the zinc ion and NH of the ligand is mentioned in the enzyme schematic in Figure 3. Can the distances and coordination with Zinc be illustrated in ligand-bound structures in Figure 4?

      We considered and decided that picture which shows the numerous distances between ligand atoms and protein residues would be difficult to follow. The structures provided to the PDB could be analyzed for every aspect of the complex structure.

      (10) A key difference between covalent (compound 12) and its non-covalent counterpart, compound 5, is the two oxygens attached to sulfur in compound 12. Do protein side chains or water interact with these oxygens? Are these oxygen atoms exposed to solvent? Can authors show the interactions or clarify if there is no interaction?

      The two oxygens in the ligand molecule serve several purposes. First, they pull out electrons and diminish the pKa of the sulfonamide, thus making interaction stronger. Second, the oxygen atoms may make contacts, hydrogen bonds with the protein molecule and may also be important for covalent bond formation. Exact energy contributions cannot be determined from the structure directly. Thus, we decided to not yet explore and delve into this area.

      (11) Fix the font size of the text in lines 355-356.

      The font has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Previous studies have used a randomly induced label to estimate the number of hematopoietic precursors that contribute to hematopoiesis. In particular, the McKinneyFreeman lab established a measurable range of precursors of 50-2500 cells using random induction of one of the 4 fluorescent proteins (FPs) of a Confetti reporter in the fetal liver to show that hundreds of precursors establish lifelong hematopoiesis. In the presented work, Liu and colleagues aim to extend the measurable range of precursor numbers previously established and enable measurement in a variety of contexts beyond embryonic development. To this end, the authors investigated whether the random induction of a given Confetti FP follows the principles of binomial distribution such that the variance inversely correlates with the precursor number. They tested their hypothesis using a simplified 2-color in vitro system, paying particular attention to minimizing sources of experimental error (elimination of outliers, sample size, events recorded, etc.) that may obscure the measurement of variance. As a result, the data generated are robust and show that the measurable range of precursors can be extended up to 105 cells. They use tamoxifen-inducible Scl-CreER, which is active in hematopoietic stem and progenitor cells (HSPCs) to induce Confetti labeling, and investigated whether they could extend their model to cell numbers below 50 with in vivo transplantation of high versus low numbers of Confetti total bone marrow (BM) cells. The premise of binomial distribution requires that the number of precursors remains constant within a group of mice. The rare frequency of HSPCs in the BM means that the experimentally generated "low" number recipient animals showed some small variability of seeding number, which does not follow the requirement for binomial distribution. While variance due to differences in precursor numbers still dominates, it is unclear how accurate estimated numbers are when precursor numbers are low (<10).

      According to our simulation, the differences between estimated numbers and the corresponding expected numbers are more profound at numbers below 10, but they are still relatively small. Since Figure S4A is in log-scale, it might be difficult for readers to appreciate the magnitude in difference from the graph. We plan to add a linear scale figure to Figure S4A for better visualization of the absolute value differences (left). We also plan to provide an additional graph quantifying the value differences between estimated and expected values for numbers below 15 (right). From both graphs, the maximum difference between estimated n and expected n occurs at 10 precursor numbers (estimated as 7.6). We admit that these numbers are not numerically the same, and some minor correction of the formula may be needed if a very accurate absolute number is warrant. However, we also want to emphasize that 1. most estimated n values are within 25% range of the expected n; 2. despite the minor discrepancy, the estimated n is still highly correlated with the expected n, so the comparison between different precursor numbers was not affected.

      Author response image 1.

      The authors then apply their model to estimate the number of hematopoietic precursors that contribute to hematopoiesis in a variety of contexts including adult steady state, fetal liver, following myeloablation, and a genetic model of Fanconi anemia. Their modeling shows:

      - thousands of precursors (~2400-2600) contribute to adult myelopoiesis, which is in line with results from a previous study (Sun et al, 2014).

      - myeloablation (single dose 5-FU), while reducing precursor numbers of myeloid progenitors and HSPCs, was not associated with a reduction in precursor numbers of LTHSCs.

      - no major expansion of precursor number in the fetal liver derived from labeling at E11.5 versus E14.5, consistent with recent findings from Ganuza et al, 2022.

      - normal precursor numbers in Fancc-/- mice at steady state and from competitive transplantation of young Fancc-/- BM cells, suggesting that reduced Fancc-/- cell proliferation may underlie the reduced chimerism upon transplantation.

      - reduced number of lymphoid precursors following transplantation of BM cells from 9month-old Fancc-/- animals (beyond this age animals have decreased survival).

      Although this system does not permit the tracing of individual clones, the modeling presented allows measurements of clonal activity covering nearly the entire HSPC population (as recently estimated by Cosgrove et al, 2021) and can be applied to a wide range of in vivo contexts with relative ease. The conclusions are generally sound and based on high-quality data. Nevertheless, some results could benefit from further explanation or discussion:

      - The estimated number of LT-HSCs that contribute to myelopoiesis is not specifically provided, but from the text, it would be calculated to be 1958/5 = ~391. Data from Busch et al, 2015 suggest that the number of differentiation-active HSCs is 5.2x103, which is considered the maximum limit. There is nevertheless a more than 10-fold difference between these two estimates, and it is unclear how this discrepancy arises.

      First, we would like to clarify a sentence in the manuscript. 

      “The average myeloid precursor number at the time of BM analysis (1958) matched the average precursor number calculated from BM myeloid progenitors (MP, Lin-Sca-1-cKit+) and HSPCs (1773 and 1917), but it was five-fold higher than that of LT-HSC (Figure 3E).”

      In this sentence, we compared the number of precursors calculated from peripheral blood myeloid cells to the those calculated from BM myeloid progenitor, HSPC and LT-HSC. However, we did not intend to imply that those precursors numbers calculated from HSPC and LT-HSC specifically contribute to myelopoiesis. To avoid misunderstanding, we propose to change this sentence to read:

      “The average precursor number calculated from PB myeloid cells at the time of BM analysis (1958) matched those calculated from BM myeloid progenitors (MP, Lin-Sca-1-cKit+) and HSPCs (1773 and 1917), but it was fivefold higher than that of LT-HSC (Figure 3E).”

      Nonetheless, we appreciate the reviewers’ comment on the gap between the precursor numbers of LT-HSC and the number of differentiation-active HSCs reported in Busch et al, 2015. We propose the following explanation: 

      First of all, precursor numbers reflect LT-HSC self-renewal by symmetric division and maintenance by asymmetric division but not differentiation. To compare the number of differentiation-active LT-HSC, precursor numbers measured from differentiated progeny (progenitors) is a better choice. As our system does not differentiate the origin of a precursor, measuring the precursor number of differentiation-active LT-HSC is difficult, since progenitors may also derive from other long-lived MPPs. However, if we assume that most divisions of LT-HSC are asymmetric division, generating one LT-HSC and one progenitor, then we can approximate the number of differentiation-active HSCs with the precursor numbers of LT-HSC.

      Second, when Busch et al, 2015 calculated the number of differentiation-active HSC, they measured the cumulative activity of stem cells by following the mice up to 36 weeks postinduction. Our method measured the recent but not accumulative activity of HSC, thus the number of differentiation-active HSC in Busch et al 2015 is predicted to be higher. 

      Third, Busch et al, 2015 used Tie2MCM Cre to trace HSC. It has been shown that Tie2+ HSC have a higher reconstitution capacity (Ito et al 2016, Science), but no one has compared the in situ activity of Tie2+ and Tie2- HSC in a native environment. Since the behavior of HSCs in situ may be very different from their behavior in a transplantation setting, it is possible that Tie2+ HSC are more prone to differentiation than Tie2- HSC in a native environment, leading to an overestimation of differentiation-active HSC in the HSC pool. 

      - Similarly, in Figure 3E, the estimated number of precursors is highest in MPP4, a population typically associated with lymphoid potential and transient myeloid potential, whereas the numbers of MPP3, traditionally associated with myeloid potential, tend to be higher but are not significantly different than those found in HSCs.

      We believe this question results from similar confusion of the nomenclature of myeloid precursors in the previous question. As explained previously, the precursors quantified reflect a variety of possible differentiation routes, not just myelopoiesis. Thus, Figure 3E did not suggest that the lymphoid-biased MPP4 has more myeloid precursors than LTHSC. Instead, it simply means more precursors contribute to MPP4 population than the LT-HSC pool. We apologize for the confusion.

      - The requirement for estimating precursor numbers at stable levels of Confetti labeling is not well explained. As a result, it is unclear how accurate the estimates of B cell precursors upon transplantation of Fancc-/- cells are. In previous experiments on normal Confetti mice (Figure 3B), the authors do not estimate precursors of lymphopoiesis because Confetti labeling of B cells is not saturated, and this appears to be the case in Fanc-/- animals as well (Fig. 5B).

      We appreciate the request for clarification. Our approach required the labeling level to be stable in peripheral blood because we calculate the total number of precursors by normalizing precursor numbers in Confetti+ population with the labeling level (precursor numbers in Confetti+ population divided by labeling efficiency). If the labeling level is not saturated, then the calculation of total precursors will be overestimated. This requirement is more important in native hematopoiesis, since it takes a long time for the mature population, especially the lymphoid population, to be fully replaced by the progenies from the labeled HSPC population (as suggested by Busch et al 2015 and Säwen et al 2018). In transplantation, since lethal irradiation was performed, mature blood cells were rapidly generated by HSPCs, thus saturation of labeling level is not a major concern for precursor quantification. We plan to add Author response image 2 as evidence that Confetti labeling level was stable in mice transplanted with Fancc-/- cells.  

      Author response image 2.

      - Do 9-month-old Fanc-/- animals have reduced lymphoid precursors as well?

      Because of the non-saturated labeling in peripheral blood B cells and extra-HSPC induction of Confetti in T cells, we cannot accurately measure lymphoid precursor numbers in 9-month-old Fancc-/- animals. As an alternative, the precursor number of lymphoid biased MPP4 population were comparable between Fancc+/+ and Fancc-/- animals (Figure 5D).   We plan to add the frequency of common lymphoid progenitors (defined by Lin-IL-7Ra+Sca-1midcKitmid) add a supplementary figure to show were CLP frequencies between these two genotypes.

      Author response image 3.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Liu et al. uses Confetti labeling of hematopoietic stem and progenitor cells in situ to infer the clonal dynamics of adult hematopoiesis. The authors apply a new mathematical framework to analyze the data, allowing them to increase the range of applicability of this tool up to tens of thousands of precursors. With this tool, they (1) provide evidence for the large polyclonality of adult hematopoiesis, (2) offer insights on the expansion dynamics in the fetal liver stage, (3) assess the clonal dynamics in a Fanconi anemia model (Fancc), which has engraftment defects during transplantation.

      Strengths:

      The manuscript is well written, with beautiful and clear figures, and both methods and mathematical models are clear and easy to understand.

      Since 2017, Mikel Ganuza and Shannon McKinney-Freeman have been using these Confetti approaches that rely on calculating the variance across independent biological replicates as a way to infer clonal dynamics. This is a powerful tool and it is a pleasure to see it being implemented in more labs around the world. One of the cool novelties of the current manuscript is using a mathematical model (based on a binomial distribution) to avoid directly regressing the Confetti labeling variance with the number of clones (which only has linearity for a small range of clone numbers). As a result, this current manuscript of Liu et al. methodologically extends the usability of the Confetti approach, allowing them more precise and robust quantification.

      They then use this model to revisit some questions from various Ganuza et al. papers, validating most of their conclusions. The application to the clonal dynamics of hematopoiesis in a model of Fanconi anemia (Fancc mice) is very much another novel aspect, and shows the surprising result that clonal dynamics are remarkably similar to the wild-type (in spite of the defect that these Fancc HSCs have during engraftment).

      Overall, the manuscript succeeds at what it proposes to do, stretching out the possibilities of this Confetti model, which I believe will be useful for the entire community of stem cell biologists, and possibly make these assays available to other stem cell regenerating systems.

      Weaknesses:

      My main concern with this work is the choice of CreER driver line, which then relates to some of the conclusions made. Scl-CreER succeeds at being as homogenous as possible in labeling HSC/MPPs... however it is clear that it also labels a subcompartment of HSC clones that become dominant with time... This is seen as the percentage of Confettirecombined cells never ceases to increase during the 9-month chase of labeled cells, suggesting that non-labeled cells are being replaced by labeled cells. The reason why this is important is that then one cannot really make conclusions about the clonal dynamics of the unlabeled cells (e.g. for estimating the total number of clones, etc.).

      We appreciate the reviewers’ comments. We also agree that this is especially a concern for measuring B cell precursors in native hematopoiesis. For myeloid cells, the increase was much less profound (0.5% per month) after month four post-induction. One way to investigate the dynamics of unlabeled cells is to induce different groups of mice with different doses of tamoxifen so that labeling efficiency varies among different groups. With 14 days of tamoxifen treatment, maximum 60% of HSPC can be labeled (RFP+CFP+YFP). If the unlabeled cells behave similarly with labeled cells, then varying the labeling efficiency shouldn’t affect the total number of precursors calculated (if excluding the potential effect of longer tamoxifen treatment on HSC). While we haven’t extensively performed such lengthy experiment, we have performed one measurement (5 mice) with 14-days of tamoxifen treatment and showed that peripheral blood myeloid precursor numbers calculated from this experiment were comparable to the ones from Figure 3 (2-day tamoxifen).

      Author response image 4.

      It's possible that those HSPC that are never labeled with Confetti even during longer tamoxifen treatment could behave differently. In this case, a different Cre driver may provide insight into the total precursor numbers.

      I am not sure about the claims that the data shows little precursor expansion from E11 to E14. First, these experiments are done with fewer than 5 replicates, and thus they have much higher error, which is particularly concerning for distinguishing differences of such a small number of clones. Second, the authors do see a ~0.5-1 log difference between E11 and E14 (when looking at months 2-3). When looking at months 5+, there is already a clear decline in the total number of clones in both adult-labeled and embryonic-labeled, so these time points are not as good for estimating the embryonic expansion. In any case, the number of precursors at E11 (which in the end defines the degree of expansion) is always overestimated (and thus, the expansion underestimated) due to the effects of lingering tamoxifen after injection (which continues to cause Confetti allele recombination as stem cell divide). Thus, I think these results are still compatible with expansion in the fetal liver (the degree of which still remains uncertain to me).

      We agreed adding additional replicates will reducing any error and boost confidence in our conclusions. The dilemma of comparing fetal- and adult-labeled cohorts is that HSPC activities could not be synchronized among different developmental stages. At fetal to neonatal stage, HSPC proliferate faster to generate new blood cells and support developmental need, while at adult stage HSPC proliferate much slower. Thus, it takes long time for the mature myeloid cells in the adult-labeled cohort to reach a stable Confetti labeling and provide an accurate quantification of precursor. While we agree that it might be better to compare precursor numbers in earlier months, we preferred to compare precursor numbers at later time points for the aforementioned reasons. The other option is to compare the number of HSPC precursors in the BM at earlier time points, as no equilibration of labeling level is required in HSPC, but this requires earlier sacrifice, compromising long term assessment.    

      We did not revisit questions about the lingering effect of tamoxifen, as this has been studied by Ganuza et al 2017. They showed that tamoxifen was not able to induce additional Confetti recombination if given one day ahead, suggesting the effective window for tamoxifen is less than 24h.

      Based on our data, the expansion of lifelong precursors range anywhere from 1.4 to 7.0 (Figure 4G). It’s possible that we might observe a higher level of expansion if the comparison was done in earlier time points. Nonetheless, the assertion that the expansion of life-long HSPC is not as profound as evidenced by transplantation, emphasizes value of HSPC activity analysis in situ.

      Reviewer #3 (Public Review):

      Summary:  

      Liu et al. focus on a mathematical method to quantify active hematopoietic precursors in mice using Confetti reporter mice combined with Cre-lox technology. The paper explores the hematopoietic dynamics in various scenarios, including homeostasis, myeloablation with 5-fluorouracil, Fanconi anemia (FA), and post-transplant environments. The key findings and strengths of the paper include (1) precursor quantification: The study develops a method based on the binomial distribution of fluorescent protein expression to estimate precursor numbers. This method is validated across a wide dynamic range, proving more reliable than previous approaches that suffered from limited range and high variance outside this range; (2) dynamic response analysis: The paper examines how hematopoietic precursors respond to myeloablation and transplantation; (3) application in disease models: The method is applied to the FA mouse model, revealing that these mice maintain normal precursor numbers under steady-state conditions and posttransplantation, which challenges some assumptions about FA pathology. Despite the normal precursor count, a diminished repopulation capability suggests other factors at play, possibly related to cell proliferation or other cellular dysfunctions. In addition, the FA mouse model showed a reduction in active lymphoid precursors post-transplantation, contributing to decreased repopulation capacity as the mice aged. The authors are aware of the limitation of the assumption of uniform expansion. The paper assumes a uniform expansion from active precursor to progenies for quantifying precursor numbers. This assumption may not hold in all biological scenarios, especially in disease states where hematopoietic dynamics can be significantly altered. If non-uniformity is high, this could affect the accuracy of the quantification. Overall, the study underscores the importance of precise quantification of hematopoietic precursors in understanding both normal and pathological states in hematopoiesis, presenting a robust tool that could significantly enhance research in hematopoietic disorders and therapy development. The following concerns should be addressed.

      Major Points:

      • The authors have shown a wide range of seeded cells (1 to 1e5) (Figure 1D) that follow the linear binomial rule. As the standard deviation converges eventually with more seeded cells, the authors need to address this limitation by seeding the number of cells at which the assumption fails.

      While number range above 105 is not required for our measurement of hematopoietic precursors in mice, we agree that it will be valuable to understand the upper limit of experimental measurement. we plan to seed 106-107 cells per replicate to address reviewer’s comments. 

      • Line 276: This suggests myelopoiesis is preferred when very few precursors are available after irradiation-mediated injury. Did the authors see more myeloid progenitors at 1 month post-transplantation with low precursor number? The authors need to show this data in a supplement.

      While we appreciate the concern, we did not generate this dataset because this requires take down of a substantial number of animals at one-month post-transplantation. 

      Minor Points:

      • Please cite a reference for line 40: a rare case where a single HSPC clone supports hematopoiesis.

      • Line 262-263: "This discrepancy may reflect uneven seeding of precursors to the BM throughout the body after transplantation and the fact that we only sampled a part of the BM (femur, tibia, and pelvis)." Consider citing this paper (https://doi.org/10.1016/j.cell.2023.09.019) that explores the HSPCs migration across different bones.

      • Lines 299 and 304. Misspellings of RFP.

      We appreciate reviewer’s suggestions and will modify as suggested. 

      • The title is misleading as the paper's main focus is the precursor number estimator using the binomial nature of fluorescent tagging. Using a single-copy cassette of Confetti mice cannot be used to measure clonality.

      We appreciate reviewer’s suggestions and plan to modify the title of the manuscript to read: “Dynamic Tracking of Native Precursors in Adult Mice”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This study explores the therapeutic potential of KMO inhibition in endometriosis, a condition with limited treatment options. 

      Strengths: 

      KNS898 is a novel specific KMO inhibitor and is orally bioavailable, providing a convenient and non-hormonal treatment option for endometriosis. The promising efficacy of KNS898 was demonstrated in a relevant preclinical mouse model of endometriosis with pathological and behavioural assessments performed. 

      Weaknesses: 

      (1) The expression of KMO in human normal endometrium and endometrial lesions was not quantified. Western blot or quantification of IHC images will provide valuable insight.

      Given the differential expression of KMO in luminal epithelial cells lining the endometrial glands compared to the other parts of the endometrium, a general endometrial Western Blot prep is not going to be additionally helpful or accurate in addressing this question, without e.g. laser capture microdissection or single cell quantitative proteomics. Furthermore, KMO is a flavin-dependent monooxygenase and the activity, especially generating the oxidative stressor product 3-hydroxykynurenine is far more dependent on kynurenine substrate availability than it is on actual enzyme abundance - although it is important to show (as we have done), that KMO is present in the human endometrial glands and in human distended endometrial gland-like structures (DEGLS).

      If KMO is not overexpressed in diseased tissues i.e. it may have homeostatic roles, and inhibition of KMO may have consequences on general human health and wellbeing.

      KMO certainly does have important homeostatic roles, for example as key step in the repletion of NAD+ through de novo synthesis. Although with good nutrition and sufficient NAD+ precursors in the diet e.g. niacin, that specific role may be partially redundant. KMO knockout mice exhibit normal fertility and fecundity and do not show a survival deficit compared to littermate wildtype controls (e.g. Mole et al Nature Medicine 2016). To further develop KNS898 towards clinical use, preclinical GLP safety and toxicology studies and human Phase 1 clinical trials will of course need to be completed, but that is standard for the development of any new drug

      In addition, KMO expression in control mice was not shown or quantified.

      Control mice that were not inoculated intraperitoneally with endometrial fragments did not develop DEGLS and therefore there is nothing to show or quantify.

      Images of KMO expression in endometriosis mice with treatments should be shown in Figure 4.

      We have now included a representative KMO immunohistochemistry image from each endometriosis group and included all KMO immunohistochemistry images in Supplementary Information.

      The images showing quantification analysis (Figure 4A-F) can be moved to supplementary material.

      This recommendation contradicts the emphasis placed by the same reviewer earlier regarding quantification, so we have elected to keep it where it is.

      (2) Figure 1 only showed representative images from a few patients. A description of whether KMO expression varies between patients and whether it correlates with AFS stages/disease severity will be helpful. Images from additional patients can be provided in supplementary material. 

      We have added extra information to the Figure legend to clarify the disease stage of the superficial peritoneal lesions which were illustrated (Stage I/II) and to link them to the information in supplementary Table S1. In total we examined 11 peritoneal lesions and 5 ovarian lesions (stage III/IV) – in every sample examined immunopositive staining was most intense in epithelial cells lining gland-like structures. Sections illustrated were chosen to illustrate this key finding.

      (3) For Home Cage Analysis, different measurements were performed as stated in methods including total moving distance, total moving time, moving speed, isolation/separation distance, isolated time, peripheral time, peripheral distance, in centre zones time, in centre zones distance, climbing time, and body temperature. However, only the finding for peripheral distance was reported in the manuscript. 

      This was indeed a large amount of output, which we rationalised for the benefit of a concise paper. The paper now includes a description of which parameters showed a difference with drug treatment.

      (4) The rationale for choosing the different dose levels of KNS898 - 0.01-25mg/kg was not provided. What is the IC50 of a drug? 

      KNS898 dosing has been extensively characterised by us in multiple species, and the pIC50 has already been published (e.g. Hayes et al Cell Reports 2023 and elsewhere). We now include the pIC50 in the present manuscript to save the reader from having to search through another reference.

      (5) Statistical significance: 

      (a) Were stats performed for Fig 3B-E?

      Now included, thank you.

      (b) Line 141 - 'P = 0.004 for DEGLS per group' 

      However, statistics were not shown in the figure. 

      Thanks, now displayed on figure.

      (c) Line 166 - 'the mechanical allodynia threshold in the hind paw was statistically significantly lower compared to baseline for the group' 

      However, statistics were not shown in the figure. 

      (d) Line 170 - 'Two-way ANOVA, Group effect P = 0.003, time effect P < 0.0001' The stats need to be annotated appropriately in Figure 5A as two separate symbols. 

      Arguably the far more important comparison in this figure is whether there is any effect of treatment, and to mark multiple statistical comparisons on the figure would make it difficult to understand. Instead, the figure legend and results text have been clarified on this point.

      (e) Figure 5B - multiple comparisons of two-way ANOVA are needed. G4 does not look different to G3 at D42. 

      Multiple comparison testing (Dunnett’s T3) was done and the results have been clarified in the text and figure legends.

      (f) Line 565 - 'non-significant improvement in KNS898 treated groups'. However, ** was annotated in Figure 5A. 

      Thank you. This is an error that has been checked and corrected.

      (6) Discussion is very light. No reference to previous publications was made in the discussion. Discussion on potential mechanistic pathways of KYR/KMO in the pathogenesis of endometriosis will be helpful, as the expression and function of KMO and/or other metabolites in endometrial-related conditions. 

      The discussion is deliberately concise and focussed. The paper has 21 references to previous publications. A speculative discussion is generally not favoured by us.

      The findings in this study generally support the conclusion although some key data which strengthen the conclusion eg quantification of KMO in normal and diseased tissue is lacking.

      We differ from the reviewer here and do not think that those data would materially affect the likelihood of KMO inhibition being efficacious in human endometriosis in Phase 2/3 clinical trials.

      Before KMO inhibitors can be used for endometriosis, the function of KMO in the context of endometriosis should be explored eg KMO knockout mice should be studied. 

      We take the view that before KMO inhibitors can be used for endometriosis in patients there are multiple other regulatory and clinical development steps that are required that would be a priority. While using a KMO knockout mouse might be an interesting scientific experiment, it would not impact on the critical path in a material way.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aim to address the clinical challenge of treating endometriosis, a debilitating condition with limited and often ineffective treatment options. They propose that inhibiting KMO could be a novel non-hormonal therapeutic approach. Their study focuses on: 

      • Characterising KMO expression in human and mouse endometriosis tissues. 

      • Investigating the effects of KMO inhibitor KNS898 on inflammation, lesion volume, and pain in a mouse model of endometriosis. 

      • Demonstrating the efficacy of KMO blockade in improving histological and symptomatic features of endometriosis. 

      Strengths: 

      • Novelty and Relevance: The study addresses a significant clinical need for better endometriosis treatments and explores a novel therapeutic target. 

      • Comprehensive Approach: The authors use both human biobanked tissues and a mouse model to study KMO expression and the effects of its inhibition. 

      • Clear Biochemical Outcomes: The administration of KNS898 reliably induced KMO blockade, leading to measurable biochemical changes (increased kynurenine, increased kynurenic acid, reduced 3-hydroxykynurenine). 

      Weaknesses: 

      • Limited Mechanistic Insight: The study does not thoroughly investigate the mechanistic pathways through which KNS898 affects endometriosis. Specifically, the local vs. systemic effects of KMO inhibition are not well differentiated. 

      While we agree that this is not a comprehensive mechanistic analysis, given that the ultimate therapy would be almost certainly a once daily oral dosing i.e. systemic administration, we do not consider differentiating local vs systemic effects of KMO inhibition to be critical to therapeutic development in this scenario.

      • Statistical Analysis Issues: The choice of statistical tests (e.g., two-way ANOVA instead of repeated measures ANOVA for behavioral data) may not be the most appropriate, potentially impacting the validity of the results. 

      The selection of two-way ANOVA (time and group) is sufficient and correct for this experimental analysis and its use does not invalidate the results. We agree that repeated measures ANOVA could be a valid alternative.

      • Quantification and Comparisons: There is insufficient quantitative comparison of KMO expression levels between normal endometrium and endometriosis lesions,

      Please see response above to quantification question raised by Reviewer 1.

      and the systemic effects of KNS898 are not fully explored or quantified in various tissues. 

      Please see earlier responses. KNS898 has been thoroughly explored in multiple tissues, species and experimental models, but those data do not need rehearsed here.

      • Potential Side Effects: The systemic accumulation of kynurenine pathway metabolites raises concerns about potential side effects, which are not addressed in the study. 

      As discussed above (response to Reviewer 1), KMO knockout mice exhibit normal fertility and fecundity and do not show a survival deficit compared to littermate wildtype controls (e.g. Mole et al Nature Medicine 2016). To further develop KNS898 towards clinical use, preclinical GLP safety and toxicology studies and human Phase 1 clinical trials will naturally need to be completed, but this is standard for the development of any new drug.

      Achievement of Aims: 

      • The authors successfully demonstrated that KMO is expressed in endometriosis lesions and that KNS898 can induce KMO blockade, leading to biochemical changes and improvements in endometriosis symptoms in a mouse model. 

      Support of Conclusions: 

      • While the data supports the potential of KMO inhibition as a therapeutic strategy, the conclusions are somewhat overextended given the limitations in mechanistic insights and statistical analysis. The study provides promising initial evidence but requires further exploration to firmly establish the efficacy and safety of KNS898 for endometriosis treatment. 

      We do not agree that the conclusions are overextended based on the data presented, as expanded in the reply to the eLife editorial assessment at the beginning of this response. It is clear that additional preclinical, regulatory and clinical development work, and human clinical trials will be required to firmly establish the efficacy and safety of KN898 for endometriosis treatment.

      Impact on the Field: 

      • The study introduces a novel therapeutic target for endometriosis, potentially leading to non-hormonal treatment options. If validated, KMO inhibition could significantly impact the management of endometriosis. 

      Utility of Methods and Data: 

      • The methods used provide a foundation for further research, although they require refinement. The data, while promising, need more rigorous statistical analysis and deeper mechanistic exploration to be fully convincing and useful to the community. 

      We believe that the data are a) convincing, and b) useful to the community. To be advanced effectively towards patients, KNS898 needs to follow the critical development path outlined above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) Change 'hyperalgia' to hyperalgesia throughout the manuscript including the title. 

      Done

      (2) Line 69 - write '3-HK' in full. 

      Done

      (3) Line 85 - the findings of the study include 'define the preclinical efficacy of KNS898 in reducing inflammation'. The inflammatory profile was not studied. 

      Changed to “disease”

      (4) Line 259 - write 'EPHect' in full. 

      Done

      (5) Line 260 - write 'AFS' in full. Also, abbreviate 'AFS' in the caption of Table S1. 

      Done

      (6) 20 patients were listed in Table S1 but only 19 were accounted for in the methods section. 

      Apologies there was an error and has now been corrected in the methods section as one of the endometrial samples had not been included. Table S1 has also been changed to make it clear which samples were eutopic endometrium to differentiate them from the lesions.

      (7) The location from which the endometrial lesion tissues were obtained should be provided in Table S1. 

      Table S1 has been changed to make it clear that the subtypes of lesions examined were classified as Stage I/II – superficial peritoneal subtype and Stage III/IV – endometrioma. The methods section has also been updated to reflect these subtypes (lines 272-277).

      (8) Table S2 - G5 should be given compound 'A' not 'B'. 

      Thank you. Corrected.

      (9) Figure 2E was not referenced in the text and no figure legend was provided. 

      Now referenced and the figure legend updated.

      (10) Figure 3A - font needs to be enlarged. HCA baseline recording was annotated as performed twice in the protocol. When is the baseline taken and on what day was the Week 12 measurement taken (refer to Figures 5C and D)? 

      Font has been enlarged as requested. The second HCA baseline annotation in Fig 3A is a cut-and-paste error, now rectified and the time of second measurement annotated.

      (11) Line 133 - 'In KNS898-treated group G4 (endometriosis + treatment from Day 19), DEGLS formed in 4 of 15 mice (26.7%) and in G5 (Endo + treatment start on Day 26) in 6 of 15 mice (40%) (Fig. 3f).'. The aforementioned data is not reflected in Figure 3F. 

      Thank you. This has been rectified.

      (12) Line 137 - 'Mice with endometriosis receiving KNS898 from the time of inoculation (G4) had an average of 2.0 DEGLS per animal with DEGLS (total = 8 DEGLS in 4 mice in G4) and those receiving KNS898 1 week after inoculation (G5) had an average of 1.8 DEGLS per animal (total = 11 DEGLS in 6 mice in G5) (Figs. 3g and 3h).' 

      The aforementioned data is not reflected in Figure 3G. There is no Figure 3H shown. 

      Rectified as above.

      (13) Provide a discussion of why KA levels were significantly lower in Figure 3E compared to Figure 2C. 

      (14) Figure legend for Figure 3 - G1 and G2 were noted as n=8. However, Figure S1 and Table S2 noted both groups as n=10. 

      Thank you. This is a typographical error. The legend for Fig 3 should indeed read n=10 for G1 and G2 and has been corrected.

      (15) Line 181 - 'compared to non-operated and sham-operated control groups'. Only the sham group was shown in Figures 5C and D. 

      This text has been clarified to refer only to the data shown.

      (16) Figure 1 images need scalebars. Same for Figure 4. 

      Now added

      (17) Figure 3B - y-axis is fold change? 

      Relative concentration. Legend has been clarified.

      (18) Figures 5A and B - are the last Von Frey measurements taken on Day 40 (as per Figure 3A) or 42?

      Taken on Day 42. Fig 3A (the prospective protocol figure) has been clarified to reflect what actually happened (D42) as opposed to what was planned (D40) to pre-empt any further confusion.

      (19) Symbols in Figure S1 need to be explained in the Figure legend. 

      Done

      (20) Figures 2A and 2D should not be plotted in log scale to match the description of results in Line 106 and Line 118. 

      These particular results are plotted on a log scale to allow the reader to visualise that detectable levels of drug are measurable at very low doses and that there is no significant pharmacodynamic effect at that low dose. We choose to retain the present format.

      Reviewer #2 (Recommendations For The Authors): 

      Comments and queries 

      Introduction/aims section: 

      Line 82 - 87: Clarify in the proposal aims what is being accessed and analysed in humans and/or in animal models (mice). Specifically state clearly the correlations with KMO expression. Were the correlations between KMO expression with features of inflammation performed only in mice or also in humans? 

      Thank you for this comment. The aims have been clarified in the Introduction.

      Section - KMO is expressed in human eutopic endometrium and human endometriosis tissue lesions: 

      Was any quantitative or semi-quantitative method used to quantify the KMO expression in human tissues? Although the authors claimed that "KMO was strongly immunopositive in human peritoneal endometriosis lesions" by the representative figures it is not clear if KMO expression is similar, higher or lower between normal endometrium and peritoneal endometriosis lesions. 

      We have added extra information to the legend of Figure 1 to identify the PIN number of the superficial lesions illustrated. The key finding from the immunostaining with the antibody which had been previously validated as specific for KMO was that the most intense immunopositive response was in glandular epithelial cells and the samples illustrate this result.

      Section - Oral KNS898 inhibits KMO in mice: 

      The authors clearly confirmed the target engagement of KNS898 in inhibiting KMO activity and, therefore, affecting upstream and downstream metabolites systemically in (peripheral fluid/ plasma) mice. Whether KNS898 effect is broad and targets systemic immune cells and whole body cells and tissue was not explored. It was also not explored if KNS898 is able to specifically inhibit KMO locally at the endometrium tissue by targeting epithelial and/or infiltrated immune cells, for example. 

      That is correct.

      It would be interesting to measure (or if it was measured to report in this section and also in Figure 2) the levels of KYN, KA and 3HK in naïve animals that did not receive KNS898. It would help to understand the net effect of KNS898 on the levels of kynurenine pathway metabolites and, therefore, justify the dose chosen.

      These data are already presented in Fig 3B-E, control group.

      Perhaps then the chosen dose could be lower considering the possible substantial changes in kynurenine pathway metabolites levels, which are reported to exert an effect in many cells, tissues and systems and could, therefore, precipitate side effects. Even more considering that the values for these metabolites are expressed as ng/ml, which hinders the comparison of the metabolite levels with the one reported for naïve animals in the literature. I would also suggest expressing the metabolite levels as nM/L. 

      This is not a relevant method of determining dose-limiting toxicity or safety pharmacology/toxicology, either non-GLP or GLP. There are international guidelines on the proper conduct of those studies. This is also why it is important not to make claims about the safety or otherwise of an experimental compound in an in vivo setting that has not explicitly complied with those regulatory standards. With regard to the units recommendation, accepted units are ng/mL or nM, not usually nM/L.

      Section - KMO blockade reduces endometrial gland-like lesion burden in experimental endometriosis in mice: 

      Line 130: It would be better to replace "blockade of 3HK production" with "reduction of 3HK production" to better reflect the results. 

      Changed to “inhibition of 3HK production”.

      Line 140: In G5 (treatment starting at Day 26/ 1 week after inoculation), is the experimental model of endometriosis already established with all pathological and phenotypic features? 

      This was not specifically tested in this experiment.

      Lines 146 - 148: It would be better to specify that "Overall, there was no significant difference IN BODY WEIGHT between G3 and the KNS898 treatment groups G4 and G5 (endometriosis + treatment from Day 26)". Otherwise, this last sentence might be interpreted as the overall conclusion of this result sub-section. 

      Thank you, a good point and has been corrected.

      The authors demonstrated with an experimental approach that KMO blockade reduces a pathological measure of endometriosis i.e., endometrial gland-like lesion burden, in experimental endometriosis in mice when both administrated concomitant but also after the disease development. Although mechanistic insights about how reduced KMO activity can reduce the developed distended endometrial gland-like structures were not explored. Therefore, it remains to be investigated which (and how ) kynurenine pathway metabolites are directly linked to the beneficial effects of KMO blockade in the experimental model of endometriosis.

      We agree.

      Although the beneficial effects on the pathological measures are evident, Figure 3 shows an exorbitant accumulation of KYN and KA and also a substantial reduction in 3HK after the treatment with KNS898, which then raises concerns about tolerability and side effects. Would this effective KNS898 dose be viable and translational as a therapeutic approach? 

      Please refer to comments above at multiple junctures about safety pharmacology and the clinical development critical path.

      Section - KMO is expressed in experimental endometriosis in mice: 

      By histological examination, the authors confirm that the treatment with KNS898 specifically reduced the KMO expression intensity in the DEGLS from mice. Therefore, the effect exerted by KNS898 locally on the KMO expression at the DEGLS could be, at least, partially responsible for the beneficial effects observed in Figure 3 i.e., the reduction of pathological measures. Although remains to be explored whether the effect of KNS898 in other cells or tissues could also be accountable for the beneficial effects exerted by KNS898 on the animal model of endometriosis. 

      This is correct.

      From a logical experimental point of view, I would suggest switching the order of the result subsection "KMO blockade reduces endometrial gland-like lesion burden in experimental endometriosis in mice" and "KMO is expressed in experimental endometriosis in mice" as well as the respective Figures 3 and 4. 

      We do not agree. Fig 3 (and section) is the macroscopic enumeration of DEGLS, Fig 4 (and section) is the microscopic and immunohistochemical evaluation of the lesions introduced in Fig 3. The sequence as originally presented is the more logical.

      Sections - KMO inhibition reduces mechanical allodynia in experimental endometriosis - and - KMO inhibition reduces mechanical allodynia in experimental endometriosis: 

      The authors suggested that the KMO inhibition with KNS898 exerts beneficial effects on behavioural paradigms related to the experimental model of endometriosis. Based on the statistical analysis performed for the author, KMO inhibition with KNS898 reduces mechanical allodynia, as well as rescues, impaired cage exploration behaviour and mobility in mice with endometriosis. However, I believe that the most indicated statistical tests for Von Frey (allodynia behaviour) and Home cage (illness behaviour) analyses over time would be repeated measures ANOVA and paired t-test, respectively (and not two-way ANOVA as performed). Therefore for a more trustful analysis and interpretation of this data set, I would suggest the authors modify the statistical analysis and report the corresponding interpretation of these tests. 

      The selection of two-way ANOVA (time and group) is suitable for this experimental analysis and its use does not invalidate the results. We agree that repeated measures ANOVA could be a valid alternative.

      Overall, the authors present a solid and useful case for KMO inhibition as a potential therapeutic strategy for endometriosis. However, the study would benefit from more detailed mechanistic insights, appropriate statistical analyses, and an evaluation of potential side effects. With these improvements, the research could have a significant impact on the field and pave the way for new treatment modalities for endometriosis. 

      We thank the reviewer for the positive comments and we have responded to the criticisms above.

      Specific recommendations for improvement: 

      • Mechanistic Studies: Conduct detailed studies to understand the local vs. systemic effects of KMO inhibition and its specific impacts on different cell types and tissues. If not feasible here, the authors could include in the discussion section a detailed overview of the possible mechanisms implicated. 

      While we agree that this is not a comprehensive mechanistic analysis, given that the ultimate therapy would be almost certainly a once daily oral dosing i.e. systemic administration, we do not consider differentiating local vs systemic effects of KMO inhibition to be critical to therapeutic development in this scenario. We do not think speculation about possible mechanisms that is not supported by experimental data should be included. Furthermore, that notion (of statements not supported by data) has been given as a criticism by the reviewers, and therefore consistency on this point must be preferable.

      • Quantitative Analysis: Include more robust quantitative methods to compare KMO expression levels in different tissues and assess the correlation between KNO expression and pathological and behavioural changes. 

      As discussed above, the pathophysiological importance of KMO is in its enzymatic activity, not in its abundance as a protein, and 3HK production is far more dependent on kynurenine substrate availability rather than KMO protein abundance.

      • Appropriate Statistics: Use the most suitable statistical tests for behavioural and other repeated measures data to ensure accurate interpretation. 

      As discussed above

      • Side Effect Evaluation: Investigate potential side effects of systemic KMO inhibition, particularly focusing on the long-term implications of altered kynurenine pathway metabolites. If not feasible here, the authors could include in the discussion section a detailed overview of the possible side effects associated as well as inform if KNS898 can cross the BBB and its implications. 

      For a novel small molecule therapeutic compound in preclinical/clinical development, there are strictly regulated preclinical and clinical development standards that need to be met. It would not be responsible to publish or make claims about safety and potential adverse effect profiles without conducting the proper panel of tests within a suitable regulatory framework.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Orlovskis and his colleagues revealed an interesting phenomenon that SAP54-overexpressing leaf exposure to leafhopper males is required for the attraction of followed females. By transcriptomic analysis, they demonstrated that SAP54 effectively suppresses biotic stress response pathways in leaves exposed to the males. Furthermore, they clarified how SAP54, by targeting SVP, heightens leaf vulnerability to leafhopper males, thus facilitating female attraction and subsequent plant colonization by the insects.

      Strengths:

      The phenomenon of this study is interesting and exciting.

      Weaknesses:

      The underlying mechanisms of this phenomenon are not convincing.

      We thank the reviewer for the comment of finding our study interesting and exciting. However, we respectfully disagree with the reviewer assertion that the mechanisms we uncovered are unconvincing.

      We have uncovered a significant portion of the mechanisms by which SAP54 induces the leafhopper attraction phenotype.

      First, we discovered that the SAP54-mediated attraction of leafhoppers requires the presence of male leafhoppers on the leaves. Female leafhoppers were only attracted and laid more eggs on leaves when both SAP54 and male leafhoppers were present. In the absence of either males or SAP54, female leafhoppers did not exhibit this behaviour.

      Second, we found that biotic stress responses in leaves were significantly downregulated when exposed to SAP54 and male leafhoppers, with a much lesser effect observed in the presence of females.

      Third, we identified that the presence of the MADS-box transcription factor SHORT VEGETATIVE PHASE (SVP) in leaves is crucial for the leafhopper attraction phenotype, and that SAP54 facilitates the degradation of SVP.

      Our research corroborates previous findings that SAP54-mediated degradation of MADS-box transcription factors depends on the 26S proteasome shuttle factor RAD23, which we found previously to also be necessary for the leafhopper attraction phenotype (MacLean et al., 2014. PMID: 24714165). This finding has been replicated by other research groups. Previous research has also revealed that leafhoppers are specifically attracted to leaves, not to the leaf-like flowers (Orlovskis & Hogenhout, 2016. PMID: 27446117).

      Collectively, these results suggest that SAP54 acts as a "matchmaker", helping male leafhoppers locate mates more easily by degrading SVP-containing complexes in leaves. We have updated the model in Fig. 7 to better illustrate our findings.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors show that leaf exposure to leafhopper males is required for female attraction in the SAP54-expressing plant. They clarify how SAP54, by degrading SVP, suppresses biotic stress response pathways in leaves exposed to the males, thus facilitating female attraction and plant colonization.

      Strengths:

      This study suggests the possibility that the attraction of insect vectors to leaves is the major function of SAP54, and the induction of the leaf-like flowers may be a side-effect of the degradation of MTFs and SVP. It is a very surprising discovery that only male insect vectors can effectively suppress the plant's biotic stress response pathway. Although there has been interest in the phyllody symptoms induced by SAP54, the purpose, and advantage of secreting SAP54 were unknown. The results of this study shed light on the significance of secreted proteins in the phytoplasma life cycle and should be highly evaluated.

      Weaknesses:

      One weakness of this study is that the mechanisms by which male and female leafhoppers differentially affect plant defense responses remain unclear, although I understand that this is a future study.

      The authors show that female feeding suppresses female colonization on SAP54-expressing plants. This is also an intriguing phenomenon but this study doesn't explain its molecular mechanism (Figure 7).

      Strengths:

      We appreciate the reviewer's assessment of the strengths of our study. We do indeed discuss the possibility that the induction of leaf-like flowers could be a side effect of the SAP54 effector function. However, it is not uncommon for effectors to have multiple functions, as has been frequently demonstrated for viral proteins (e.g., PMID: 34618877). Furthermore, it is increasingly evident that developmental and immune processes in organisms often overlap and are mediated by the same proteins. A notable example is the Toll-like receptors, which are widely recognized for their role in innate immunity but were initially discovered for their involvement in various developmental processes (e.g., PMID: 29695493).

      MADS-box transcription factors are known to regulate various developmental pathways in plants, and their diversification has been a key driver of evolutionary innovations in plant development. These factors are comparable to HOX genes, which are essential for the development of bilateral animals. While the role of MADS-box transcription factors in orchestrating flowering has been well-documented, recent evidence has emerged showing that they also play a role in regulating immune processes in plants. Our findings contribute to this emerging understanding, presenting novel insights into the multifunctional roles of these transcription factors.

      Specifically, the MADS-box transcription factor SVP has vital roles in both plant immunity and flowering. The SAP54-mediated targeting of this transcription factor may therefore confer multiple advantages to phytoplasmas that, as obligate colonisers, depend on plants and transmission by insects for survival. Firstly, the inhibition of flowering could delay plant senescence and death, which is particularly relevant in annual plants, the primary hosts of AY-WB phytoplasma studied here. Secondly, the downregulation of plant defence responses, particularly against males, facilitates the attraction of females, which are more likely to reproduce and thus increase the number of vectors for phytoplasma transmission. Given that phytoplasmas are obligate organisms with highly reduced genomes, it is plausible that they rely on ‘efficient proteins’ capable of targeting multiple key pathways in their hosts.

      Weaknesses:

      As explained above, we have uncovered a substantial portion of the mechanisms through which SAP54 induces the leafhopper attraction phenotypes that includes the identification of MADS-box transcription factor SVP as an important contributor. We have updated the model in Fig. 7 to better illustrate our findings.

      It is known that SVP forms quaternary structures with other (MADS-box) transcription factors, and it is seems likely that the degradations of specific SVP complexes present in fully developed leaves play a significant role in the downregulation of immune genes in the presence of SAP54 and males. These specific complexes also do not form in svp mutants, which could explain why females are attracted to these mutant plants in the presence of males. However, transcription profiles are different in male-exposed SAP54 vs male-exposed svp plants. This may be explained by SVP having multiple functions, including those that are not targeted by SAP54.

      Identifying which SVP complexes contribute to the male-mediated downregulation of immunity in the presence of SAP54 would require the development of a broad range of tools to investigate plant immunity without the confounding effects of developmental changes. This line of inquiry extends beyond the findings presented in this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Orlovskis and colleagues revealed an interesting phenomenon that SAP54-overexpressing leaf exposure to leafhopper males is required for the attraction of followed females. By transcriptomic analysis, they demonstrated that SAP54 effectively suppresses biotic stress response pathways in leaves exposed to the males. Furthermore, they clarified how SAP54, by targeting SVP, heightens leaf vulnerability to leafhopper males, thus facilitating female attraction and subsequent plant colonization by the insects. The discovery of this study is interesting and exciting. However, I have a few concerns that require authors to address.

      (1) The author demonstrated that SAP54-overexpressing leaf exposure to leafhopper males is more attractive to females. However, I was confused that the author did not analyse the choice preference of males. This is important, as the author demonstrated later that "SAP54 plants exposed to males display significant downregulation of biotic stress responses". It is very possible that the female is attracted by a mating signal, but not by reduced biotic stress responses. Also, it is important to address whether the female used in this study is virgin.

      We have analysed male preference in feeding choice tests (Figure 1, treatment 3) and described our findings in the text (p7; lines 214-216). For added clarity, we have revised the text on p7 (lines 214-216) to specify that males alone do not show any feeding preference for SAP54 plants.

      Additionally, we investigated whether females could be attracted to male-exposed SAP54 plants prior to landing and feeding using choice experiments, as depicted in Supplemental Figure 3 and discussed in the text (p9; lines 265-271). These findings suggest that long-distance cues alone do not fully account for the female attraction phenotype observed in Figure 1. We acknowledge that mating calls or volatiles may complement or enhance the transcriptional changes in male-exposed SAP54 leaves. This interpretation is further supported by comparing Figure 1, treatments 4 and 5, which shows that removing males from SAP54 leaves before female choice does not increase female colonisation. To enhance clarity and precision, we have added the term "solely" to the results (p9; line 265) and discussion (p25; line 719), and included a new sentence on p26 (lines 726-730): "However, given that the removal of males from SAP54 leaves prior to female choice does not enhance female colonisation (comparison of Figure 1, treatment 4 with treatment 5), we cannot exclude the possibility that male-produced volatiles or mating calls could enhance or supplement SAP54-dependent changes in biotic stress responses to males, thereby enhancing female attraction."

      We have also updated the methods section to clarify that a mixture of virgin and pre-mated females was used in all experiments (p28; lines 798-799), consistent with our previously published work (Orlovskis & Hogenhout, 2016. PMID: 27446117; MacLean et al., 2014. PMID: 24714165).

      (2) I was confused by the rationality of the section "Female leafhopper preference for male-exposed SAP54 plants unlikely involves long-distance cues". The volatile cues or mating calls from males can be only perceived from a distance?

      As mentioned in our response to comment 1, for clarity, we have added new text to both the results (p9; line 265) and discussion sections (p25; lines 719 and 726-730). In the results section highlighted by the reviewer (p8-9), we aimed to explicitly test whether cues produced by males (such as mating calls or pheromones) or SAP54 plants (such as plant volatiles) could account for female attraction from a distance, independent of, and prior to, physical contact with the plants or male insects.

      To address the possibility that volatiles or mating calls might be perceived simultaneously with downregulated biotic stress responses, we have included an additional sentence in the discussion, which addresses comments 1 and 2 from the reviewers. Furthermore, it is important to note that Figure 1, treatment 4, mirrors the results of Figure 1, treatment 1, suggesting that direct physical contact between males and females is not necessary for the observed female attraction. This conclusion, derived from our experiments, was already emphasised in the main text (p7; lines 218-222).

      (3) Line 271-273. How the author concluded the "immediate access". A time course experiment (detect the number of insects on each plant at different time point) for host-choice experiment is necessary.

      We have corrected and rephrased the sentence as follows:

      ‘’Therefore, these results indicate that female reproductive preference for the male-exposed SAP54 versus GFP plants is dependent on immediate access of the direct females access to the leaves of SAP54 plants and presence of males on these leaves.’’ (p9; lines 267-271).

      (4) I appreciate the transcriptome analysis. However, the figures are poorly organized. i.e. the heatmap in Figure 2 was poorly understood. The author should clearly address what is upregulated or downregulated. It is meaningless to exhibit the heatmap without explaining what gene represented. Also, it is hard for readers to distinguish the difference between the 4 maps in Figure 2, similar to the two figures in Figure 3.

      We thank the reviewer for the recommendation. To make Figure 2 and 3 easier to read and understand as stand-alone, we have changed and improved the corresponding figure legends, highlighting the colouring of up- and down-regulated DEGs as well as explaining the related supplementary file content in figure legends. For brevity and clarity, we have removed the mentioning of figure supplement 4, 5 and 6 as they have already been explained and referred to in the main text but do not directly relate to Figure 2 or 3 but rather data processing prior to analysis in Figure 2.

      We hope that the improvements in figure legends will make the Figures 2 and 3 easier and quicker to understand.

      (5) For transcriptomic analysis, three out of four replicates were well clustered, and the author excluded the outliers in subsequent analysis. Is this treatment commonly used in transcriptomic analysis? If yes, please provide corresponding references.

      Removing outliers from transcriptomic data is not unusual, as it enhances the classification of treatment groups and increases the efficiency of detecting biologically relevant differentially expressed genes (DEGs) (PMID: 36833313; PMID: 32600248). For large datasets, especially in clinical studies, automated procedures and algorithms have been developed for this purpose (PMID: 32600248; doi.org/10.1101/144519). Given our relatively small sample size of 4, we opted for a PCA-based manual outlier evaluation, followed by repeated PCA without the identified outliers. This approach demonstrated improved group discrimination (Figure Supplement 4), which can enhance downstream characterization of DEGs and pathways that explain female preference for male-exposed SAP54 plants. We have detailed this procedure on pages 9-10. It is worth noting that other automated outlier removal methods, which are also based on PCA, have been shown to be as effective as manual outlier removal (PMID: 32600248).

      (6) Figure 5A. How the experiment was done? The HA-SVP and other HA-tagged genes were stably or transiently expressed in GFP and GFP-SAP54 plants? How many replicates were conducted? The band intensity from different biological replicates should be provided. In this manuscript, no information is provided even in the method section.

      We thank the reviewer for noticing this and have updated the methods section providing more details on transient protoplast expression assays (p39; line 835). We have performed two independent degradation assays for all 5 MTF proteins and indicated in the legend of Figure 5. Western blot results from both experiments are provided as a new figure supplement 10 (p53). The degradation/destabilisation efficiency was calculated as the HA intensity divided by the RuBisCo large subunit (rbcL) intensity from the same sample, normalised to the intensity of the sample with the highest ratio from the same leaf (Rel HA/rbcL) using ImageJ. Relative pixel intensities are provided above each treatment in new figure supplement 10, as requested by the reviewer.

      (7) For the interaction assay, only Y2H was conducted. Generally, at least two methods are needed to confirm protein interaction. This is also applicable to degradation assays.

      There is substantial prior evidence that SAP54 interacts with MADS-box transcription factors and facilitates their degradation in plants, a process that also involves the 26S proteasome shuttle factor RAD23 (MacLean et al., 2014; PMID: 24714165). This interaction has been independently confirmed by other research groups using various methods, including split-YFP assays (e.g., PMID: 24597566, PMID: 26179462). Given the extensive data already available on this topic, it would be redundant to replicate all of these findings in our manuscript. Instead, we have focused on a few validated assays that effectively demonstrate the specific interactions between SAP54 and MADS-box transcription factors.

      (8) Lines 528-530. No direct evidence in this study was provided for how SAP54-mediated degradation of SVP. The author should tone down the claim.

      Our findings demonstrate that SVP is degraded in plant cells in the presence of SAP54. Additionally, through yeast two-hybrid assays, we show that SAP54 does not directly bind to SVP but does directly interact with several MADS-box transcription factors known to associate with SVP. We also provide evidence that they interact with SVP herein. Furthermore, previous studies have shown that SAP54 facilitates the degradation of MADS-box transcription factor complexes of Arabidopsis and several other eudicot species (PMID: 24597566, PMID: 26179462, PMID: 28505304, PMID: 35234248; PMID: 38105442). We have described observations herein and of others (see main text pages 4-5,  pages 19-20), and believe that we have presented them accurately without overstating our conclusions.

      (9) Overall, the phenomenon of this study is interesting, but the underlying mechanisms are not solidified. Additional work is still needed in future studies.

      We respectfully disagree—we have identified a significant portion of the mechanisms by which SAP54 induces these phenotypes. As with any research, new data often leads to further questions that may be addressed by follow-up studies. Please refer to our previous responses for additional context.

      Reviewer #2 (Recommendations For The Authors):

      Major comment

      It will be interesting to see how long male feeding affects changes in gene expression in plants. No feeding choice of females was observed on the SAP54 plants when males were removed from the clip-cages prior to the choice test with females alone (Figure 1, Treatment 5; Figure Supplement 1, Treatment 5). This indicates that SAP54 plants lose their ability to attract females as soon as males are removed. On the other hand, if the suppression of the plant's stress response pathway by male feeding continues for some time even after males are removed, I think that we cannot exclude the possiblity that volatiles emitted by males may partially promote female feeding and colonization.

      As described above, our findings suggest that long-distance cues alone do not fully account for the female attraction phenotype observed in Figure 1. We acknowledge that mating calls or volatiles may complement or enhance the transcriptional changes in male-exposed SAP54 leaves. This interpretation is further supported by comparing Figure 1, treatments 4 and 5, which shows that removing males from SAP54 leaves before female choice does not increase female colonisation. To enhance clarity and precision, we have added the term "solely" to the results (p9; line 265) and discussion (p25; line 719), and included a new sentence on p26 (lines 726-730): "However, given that the removal of males from SAP54 leaves prior to female choice does not enhance female colonisation (comparison of Figure 1, treatment 4 with treatment 5), we cannot exclude the possibility that male-produced volatiles or mating calls could enhance or supplement SAP54-dependent changes in biotic stress responses to males, thereby enhancing female attraction."

      Minor comments

      The legend of Figure 1 is missing an explanation for panel C.

      Thank you for noticing this. We have added the missing information.

      Although from a different perspective from this study, a relationship between phytoplasma infection and SVP has been previously reported (Yang et al., Plant Physiology, 2015). Shouldn't this paper be cited somewhere?

      We thank the reviewer for identifying this oversight. We have added the missing reference (PMID: 26103992) and clarified that, as seen in Figure 5E (p20; lines 555-558), our findings show a similar upregulation of SVP in male-exposed SAP54 plants as reported by Yang et al. This suggests that SAP54 and its homologs, such as PHYL1, may indeed operate through similar mechanisms by targeting MTFs that are crucial for their function. While Yang et al. described the role of SVP in the development of abnormal flower phenotypes in Catharanthus, our study reveals a completely novel role for SVP in plant-insect interactions. Although SAP54 destabilises the SVP protein, its transcript is upregulated in the presence of SAP54, indicating a potential disruption of MTF autoregulation and the MTF network as a whole.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Response to reviewer 1:

      We thank the reviewer for their positive comments and note that we made many attempts to genetically alter endothelial cells to expression mutants of SEC61A1 that are resistant to the effects of mycolactone. However, these cells were not capable of supporting expression of this transgene. Instead, we used an approach where we tested other translocation inhibitors, with a different chemical structure but same mechanism of action at the Sec61 translocon and found that these phenocopied the effects.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors have investigated the effect of the toxin mycolactone produced by mycobacterium ulcerans on the endothelium. Mycobacterium ulcerans is involved in Buruli ulcer classified as a neglected disease by WHO. This disease has dramatic consequences on the microcirculation causing important cutaneous lesions. The authors have previously demonstrated that endothelial cells are especially sensitive to mycolactone. The present study brings more insight into the mechanism involved in mycolactone-induced endothelial cells defect and thus in microcirculatory dysfunction. The authors showed that mycolactone directly affected the synthesis of proteoglycans at the level of the golgi with a major consequence on the quality of the glycocalyx and thus on the endothelial function and structure. Importantly, the authors show that blockade of the enzyme involve in this synthesis (galactosyltransferase II) phenocopied the effects of mycolactone. The effect of mycolactone on the endothelium was confirmed in vivo. Finally, the authors showed that exogenous laminin-511 reversed the effects of mycolactone, thus opening an important therapeutic perspective for the treatment of wound healing in patients suffering Buruli ulcer and presenting lesions.  

      Reviewer #2 (Public Review):  

      The authors dissected the effects of mycolacton on endothelial cell biology and vessel integrity. The study follows up on previous work by the same group, which highlighted alterations in vascular permeability and coagulation in patients with Buruli ulcer. It provides a mechanistic explanation for these clinical observations, and suggests that blockade of Sec61 in endothelial cells contributes to tissue necrosis and slow wound healing.  

      Overall, the generated data support their conclusions and I only have two major criticisms:  

      - Replicating the effects of mycolactone on endothelial parameters with Ipomoeassin F (or its derivative ZIF-80) does not demonstrate that these effects are due to Sec61 blockade. This would require genetic proof, using for example endothelial cells expressing Sec61A mutants that confer resistance to mycolactone blockade. The authors claimed in the Discussion that they could not express such mutants in primary endothelial cells, but did they try expressing mutants in HUVEC cell lines? Without such genetic evidence all statements claiming a causative link between the observed effects on endothelial parameters and Sec61 blockade should be removed or rephrased. The same applies to speculations on the role of Sec61 in epithelial migration defects in discussion. Data corresponding to Ipomoeassin F and ZIF-80 do not add important information, and may be removed or shown as supplemental information.  

      - While statistical analysis is done and P values are provided, no information is given on the statistical tests used, neither in methods nor results. This must be corrected, to evaluate the repeatability and reproducibility of their data.  

      We respectfully but fundamentally disagree with the comments regarding the Sec61 dependence of the effects that we observed. We showed that loss of glycocalyx and basement membrane components underpinned the phenotypic changes in endothelial cells (morphological changes, loss of adhesion, increased permeability, and reduced ability to repair scratch wounds). We demonstrated that we could phenocopy permeability increases and elongation phenotype by knocking down the type II membrane protein B3Galt6, and reverse the adhesion defect by exogenous provision of the secreted laminin-511 heterotrimer.

      Our conclusion that mycolactone mediates these effects via Sec61 inhibition is not based solely on the use of alternative inhibitors but is built on several pillars of evidence:  

      First, the proteomics data conforms entirely to predictions based on the topology of affected vs. non-effected proteins, and agrees with independently published proteomic datasets from T lymphocytes, dendritic cells and sensory neurons (ref.12), as well as biochemical studies performed using in vitro translocation assays (ref.11,34). Furthermore, the pattern of membrane protein down regulation observed in our experiments fits perfectly with established models of protein translocation mechanisms, particularly with respect to the lack of effect on specific topologies of multipass membrane proteins, tail anchored- and type III membrane proteins (ref.34-36).  

      Second, since Sec61 very highly conserved amongst mammals and is found in all nucleated cells, it is hard to conceptualise a framework in which mycolactone targets Sec61 in some cells and not others, as this reviewer suggests might be the case for epithelial cells [noting that the work being referred to (ref.29) predates our 2014 work showing that mycolactone is a Sec61 inhibitor (ref.7)]. Indeed, mycolactone has been shown to target Sec61 in multiple independent approaches including forward genetic screens involving random mutagenesis and CRISPR/Cas9 (ref.10, PMID: 35939511). Genetic evidence has previously been provided for the Sec61 dependence of mycolactone effects in epithelial cells (ref.10,17). We have unpublished genetic evidence that the rounding and detachment of epithelial cells due to mycolactone is reduced when resistance mutations are over expressed, and will consider including this in the next version of the manuscript.

      Third, given this weight of evidence, one would be hard-pressed to provide an alternative explanation for the specific down-regulation of glycosaminoglycan-synthesising enzymes and adhesion/basement membrane molecules while most cytosolic and non-Sec61 dependent membrane proteins are unchanged or upregulated. However, seeking to be as rigorous as possible we have here shown that a completely independent Sec61 inhibitor produces the same phenotype at the gross and molecular level. Ipomoeassin F (Ipom-F) is a glycolipid, not a polyketide lactone, yet they both compete for binding with cotransin in Sec61α (ref.6). There is significant overlap in the cellular responses to mycolactone and Ipom-F, including the induction of the integrated stress response (ref.17, PMID: 34079010), which we observed again in the current data, providing further evidence that this approach is useful when genetic approaches are technically unattainable.  

      Therefore, we are confident the effects seen on endothelial cells are Sec61-dependent. We are happy to provide more detail on our lengthy attempts at over-expressing mycolactone resistant SEC61A1 genes in HUVECs; primary endothelial cells derived from the umbilical vein. We are highly experienced in this area, and have previously stably expressed these proteins in epithelial cell lines, reproducing the resistance profile (ref.10,17). Notably though, these cells do not have normal ‘fitness’ in the absence of challenge. Since endothelial cells (and endothelial cell lines; PMID: 12560236) are extremely hard to transfect with plasmids, with efficiency routinely 5-10% (including in our hands), we developed a lentivirus system. We were eventually (after multiple attempts using different protocols) able to transduce primary HUVECs with constructs expressing GFP (at an efficiency of about 10-20%) and select/expand these under puromycin selection. Never-the-less, we never recovered any cells that expressed the flag-tagged SEC61A1 wild type or SEC61A1 carrying the resistance mutant D60G. We also attempted to select D60G-transduced cells with mycolactone epimers, an approach that can help the cells compete against non-transduced cells in culture flasks (ref.10).  We concluded that primary endothelial cells are unable to tolerate the expression of additional Sec61α, and this was incompatible with survival.  

      It’s also important to note that most endothelial cell specialists would agree that endothelial cell lines are not good models of endothelial behaviour. We tested the HMEC-1 cell line, but found it did not express prototypical endothelial marker vWF in the expected way. Therefore we focussed our efforts on primary endothelial cells. Should we be able to overcome the dual challenge of the necessity to work in primary cells, and the difficulty of over-expressing Sec61, we will update this paper at a later date with this data, and will also expand the above arguments.  

      We apologise for the embarrassing oversight of not including information about the statistical analyses we used, which of course we will correct in full in the revised version. However, we would like to provide this information to readers of the current version of the manuscript. All data were analysed using GraphPad Prism Version 9.4.1:

      Figure 1: one-way ANOVA with Dunnett’s (panel A) or Tukey’s (panel B) correction for multiple comparisons

      Figure 2 supplement: one-way ANOVA with Tukey’s correction for multiple comparisons (analysed panel)

      Figure 3: one-way ANOVA with Tukey’s (panel B) or Dunnett’s (panel E&F) correction for multiple comparisons

      Figure 4:  one-way ANOVA with Dunnett’s correction for multiple comparisons (all analysed panels)

      Figure 5 and supplement:  one-way ANOVA with Dunnett’s correction for multiple comparisons (all analysed panels)

      Figure 6:  one-way ANOVA with Dunnett’s correction for multiple comparisons (analysed panel)

      Figure 6 supplement: one-way ANOVA with Dunnett’s correction for multiple comparisons (all analysed panels)

      Figure 7: two-way ANOVA with Tukey’s correction for multiple comparisons (all analysed panels; panels B&C also included the Geisser Greenhouse correction for sphericity)

      Figure 7 supplement: Panels A&D used a repeated measures one-way ANOVA with Dunnett’s correction for multiple comparisons (panel D also included the Geisser Greenhouse correction for sphericity). Panels B,C&E used a two-way ANOVA with Tukey’s correction for multiple comparisons (panels B&C also included the Geisser Greenhouse correction for sphericity)

      Reviewer #3 (Public Review):

      Buruli ulcer is a severe skin infection in humans that is caused by a bacterium, Mycobacterium ulcerans. The main clinical sign is a massive tissue necrosis subsequent to an edema stage. The main virulence factor called mycolactone is a polyketide with a lactone core and a long alkyl chain that is released within vesicles by the bacterium. Mycolactone was already shown to account for several disease phenotypes characteristic of Buruli ulcer, for instance tissue necrosis, host immune response modulation and local analgesia. A large number of cellular pathways in various cell types was reported to be impacted by mycolactone. Among those, the Sec61 translocon involved in the transport of certain proteins to the endoplasmic reticulum was first identified by the authors of the study and is currently the most consensual target. Mycolactone disruption of Sec61 function was then shown to directly impact on cell apoptosis in macrophages, limited immune responses by T-cells and increased autophagy in dermal endothelial cells and fibroblasts. In their manuscript, TzungHarn Hsieh and their collaborators investigated the Sec61- dependent role of mycolactone on morphology, adhesion and migration of primary human dermal microvascular endothelial cells (HDMEC). They used a combination of sugar and proteomic studies on a live imagebased phenotypic assay on HDMEC to characterize the effect of mycolactone. First, they showed that upon incubation of monolayer of HDMEC with mycolactone at low dose (10 ng/mL) for 24h, the cells become elongated before rounding and eventually detached from the culture dish at 48h. Next, mycolactone was probed on a scratch assay and migration of the cells ceased upon a 24h incubation. The same effect as mycolactone on these two assays was observed for two other Sec61 inhibitors Ipomoeassin F and ZIF-80. Then, the authors resorted to the widely established mouse footpad model of M. ulcerans infection to evidence fibrinogen accumulation outside the blood vessel within the endothelium at 28 days postinfection, correlating with severe endothelial cell morphology changes.  

      To dissect the molecular pathways involved in these phenotypes, the authors performed an HDMEC membrane protein analysis and showed a decrease in the numbers of proteins involved in glycosylation and adhesion. As protein glycosylation mainly occurs in the Golgi apparatus, a deeper analysis revealed that enzymes involved in glycosaminoglycan (GAG) synthesis were lost in mycolactone treated HDMEC. A combination of immunofluorescence and flow cytometry approaches confirmed the impact of mycolactone on the ability of endothelial cells to synthesize GAG chains. The mycolactone effect on cell elongation was phenocopied by knock-down of galactosyltransferase II (B3Galt6) involved in GAG biosynthesis. A second extensive analysis of the endothelial basement membrane component and their ligands identified multiple laminins affected by mycolactone. Using similar functional studies as for GAG, the impact of mycolactone on cell rounding and migration could be reversed by the addition of laminin α5.  

      The major strengths of the study relies on a combination of cleverly designed phenotypic assays and in-depth cleverly designed membrane proteomic studies and follow-up analysis.  

      The results really support the conclusions. Congratulations!  

      The discussion takes into account the current state of the art, which has mostly been established by the authors of the present manuscript.  

      Recommendations for the authors:

      In preparing this revised version we have made a number of general improvements:

      • We added the missing information on statistical analysis that was mentioned in the public review of reviewer #2

      • We have changed all gene names to the HUGO nomenclature

      • We have changed our abbreviation of mycolactone from “MYC” to “Myco” in all figures to avoid any potential confusion with other protein factors

      • We have moved the fibrin(ogen) staining of the mouse footpads to its own figure (now Fig 2), partly due to the inclusion of additional data in Fig 1. This has changed the numbering of subsequent figures, but has also made the supplementary figures easier to track.

      Reviewer #1 (Recommendations For The Authors):  

      (1) Figure 1I. When mice are injected M. Ulcerens a measurement of local blood flow would be very informative in addition of the data shown. Cutaneous blood flow at the level of the feet is possible using laser doppler or Laser speckle imaging. With these measurements the authors would have a functional quantification of the effect of the glycosaminoglycans- Sec61α associated damages on the microcirculatory blood flow. The same measurement could also better validate the therapeutic effect of laminin. 

      We thank the reviewer for this great suggestion, and respectfully remind the reviewer that these experiments take place in CL3 containment. This often completely precludes certain procedures due to the availability of equipment inside the containment, and our ability to sterilise it. Where we are able to perform procedures, it greatly increases their complexity since any procedures on live animals must take place inside of a cabinet. Therefore, we can only use equipment that we have at our animal facility. It is not trivial to set up the regulatory permissions to perform these experiments at other facilities where more specialist equipment is located due to the containment restrictions. 

      Never-the-less we have attempted to perform ultrasound imaging of mouse feet using the VivoF and have set up a collaboration with other researchers at Surrey who have developed a novel imaging instrument to measure microvascular circulation call optical coherence tomography (OCT; https://pubmed.ncbi.nlm.nih.gov/34882760/), and we are working with them to develop a protocol that be used in small rodents.  

      However, while we have dedicated considerable time to trying to perform the suggested experiment, we have not been successful within a reasonable time frame. Consequently, if we are able to establish this technique in the M. ulcerans infection model, and/or OCT in small rodents, this will likely be beyond the scope of the current manuscript and will be a publication in its own right. We note that we have been able to perform almost all of the other requested experiments (see below), and have also been able to undertake transmission electron microscopy of M. ulcerans infected mouse footpads, which confirms the loss of the basement membrane at high resolution (Fig 7E).

      (2) Figure 1 -D. Endothelial cells were exposed to mycolactone, Ipomoeassin F or ZIF-80. The effect on the cells is clear and impressive. Nevertheless, endothelial cells in no flow conditions are considered "diseased" cells as in the areas of low flow or no flow are prone to atherosclerosis in vivo. Would the authors expect similar effects in cells submitted to flow? In this conditions cells would be already elongated in the direction of flow. 

      We agree that flow is usually experienced by endothelial cells in vivo, and have repeated a selection of our experiments under conditions that mimic flow and produce uniaxial shear stress. All showed a similar pattern of response to mycolactone, including the phenotypic changes (Fig 1I-K), loss of perlecan (Fig S6C) and laminin α4 (Fig S7B). It is true that the elongation phenotype is not as striking in a cell monolayer that already contains many elongated cells, but qualitatively the cells become disorganised and at 48 hours, their length/width ratio had increase. These results provide reassurance that our findings are physiologically relevant.

      (3) Discuss the possible consequences of your findings on vascular reactivity and especially on flow-mediated dilation and/or flow-mediated remodeling which as both are important in tissue repair and wound healing. 

      We agree with this reviewer that there are likely to be broad consequences to endothelial and vascular function as a result of our findings here. Vascular reactivity is not something we directly considered in this manuscript, and is probably better linked to our planned future work, laid out above, regarding vascular flow in the infected animals. While a key mediator of vascular tone, endothelin 1, is a Sec61-dependent secreted peptide mediator (and is likely to also be affected by mycolactone’s actions), this was not one of the >6500 proteins we identified in our proteomic study. On the other hand, it has been shown by others that mycolactone can induce NO production by in other types of cells.

      Reviewer #2 (Recommendations For The Authors):  

      - The authors use a mouse model of M. ulcerans infection of footpads to assess the in vivo relevance of their results. It would be useful to comment on any differences between human and mouse with regard to endothelial cell biology and vessel wall architecture. Since the authors have access to patients samples, parallel stainings in human lesions would have strengthened the study. 

      This is an important issue, and is one we have already addressed in our two previous articles https://pubmed.ncbi.nlm.nih.gov/35100311/ https://pubmed.ncbi.nlm.nih.gov/26181660/ . Indeed, this latter work already included a detailed analysis of fibrin staining in these Buruli ulcer patient biopsies and underpinned the hypothesis that we have now tested in the current manuscript. 

      It is worth noting that our data supports that the critical step is at an early (pre-clinical) stage, for which patient samples are not available. The proposed human challenge model (https://pubmed.ncbi.nlm.nih.gov/37384606/ ) may well provide a suitable platform such studies in the future.

      - The authors should provide in the Discussion some explanation for the differential effects of Laminin-11, -411 and -511 in Fig. 7 

      This is an interesting point, and probably related to the expression of laminin binding proteins by mycolactone-exposed endothelial cells. We pursued several candidates based on the proteomic data but could not identify a unique gene that explained this observation. Mostly likely they are explained by partial (be it low or high) loss of a combination of integrin binding proteins. Since this was rather inconclusive and we preferred not to present this data, and already said (p34-35) “We have not been able to ascribe this to the retention of a specific adhesion molecule, and instead postulate that rescue could be via residual expression of a wide variety of laminin α5 receptors

      - The word "catastrophic" in the title is very dramatic given the limited impact on the vital prognosis of patients 

      This word has been changed to “destructive”

      Reviewer #3 (Recommendations For The Authors):  

      Several points could be further discussed:  

      -In mouse model of M. ulcerans infection, in 5% of cases, animals heal spontaneously. How could the authors results contribute to bring hypothesis to this phenomenon? 

      Others have shown that the ability of some mice to control M. ulcerans infection is related to loss of mycolactone production by an unknown mechanism. It is not something we have ever observed in the infection experiments we have performed, although this may be due to the humane endpoints of our licence. However, this seems somewhat outside the main focus of the paper and we have not discussed this further.  

      -Mycolactone was also reported to induce analgesia in the mouse model. There is still controversy about the precise mechanisms involved in this mycolactone mediated painless effect. Could the data obtained here help to resolve the controversy? 

      We agree that analgaesia in M. ulcerans infection (both in mouse models and in clinical infections) is an extremely interesting area. However, we cannot mechanistically link loss of vascular integrity with the analgaesia based on the data generated in the current manuscript. Therefore we prefer not to speculate on this.

      The quantification of the microscopy images and videos should be provided as well as the script used to quantify them. 

      The reviewer is not specific about which microscopy images are being referred to in this comment, but the reference to videos leads us to assume this is related to the ZenCell OWL images/videos presented in Figure 1 and Figure S1. We had already provided quantification of these in the graphs provided, and the algorithms use for % coverage and % detached cells were provided in the instrument software used to gather the data, the ZenCell OWL (which are proprietary). Other counts were made manually, and the length:width ratio is simple arithmetic as already described in the methodology.

      The authors performed their work using chemically synthesized mycolactone obtained from the very generous Professor Kishi (Harvard University). Would the same phenotype and proteomics analysis be obtained with biologically purified mycolactone? 

      Our lab has extensive experience of both biologically purified and synthetic mycolactone, and the phenotypes observed have always been identical when using the chemically synthesised form. Therefore we did not repeat the proteomics experiments as we do not believe it would provide any greater insight into the disease mechanism. However, we have now replicated a range of findings using mycolactone biologically purified from M. ulcerans. In particular, we confirmed that the cytotoxic activity of synthetic and biological mycolactone are inseparable (Figure S1A), and the main phenotypic changes induced by mycolactone in endothelial cells (Phenotypes; Figures S1D-F, B3GALT6/perlecan/laminin α5 loss; S5A, S6B, S7A).

      Although already very comprehensive, a kinetic study of their proteomic analysis over time could strengthen the analysis (from 2H to 48H). 

      We agree that more data is always better, but since we validated our proteomic data set over multiple timepoints between 2 and 48 hrs, we do not believe this would alter the main conclusions of our work.   

      The siRNA transfection protocol could be better described. A Table listing all the reagents would help the reader.  

      A more detailed siRNA transfection protocol has been added to the methods section, and we now include a Key Resources Table at the start of the Materials & Methods section.

    1. Author response:

      Reviewer #1:

      Summary:

      The investigators undertook detailed characterization of a previously proposed membrane targeting sequence (MTS), a short N-terminal peptide, of the bactofilin BacA in Caulobacter crescentus. Using light microscopy, single molecule tracking, liposome binding assays, and molecular dynamics simulations, they provide data to suggest that this sequence indeed does function in membrane targeting and further conclude that membrane targeting is required for polymerization. While the membrane association data are reasonably convincing, there are no direct assays to assess polymerization and some assays used lack proper controls as detailed below. Since the MTS isn't required for bactofilin polymerization in other bacterial homologues, showing that membrane binding facilitates polymerization would be a significant advance for the field

      We thanks Reviewer #1 for the constructive criticism and will address the points detailed below in a revised version of the manuscript.

      Major concerns

      (1) This work claims that the N-termina MTS domain of BacA is required for polymerization, but they do not provide sufficient evidence that the ∆2-8 mutant or any of the other MTS variants actually do not polymerize (or form higher order structures). Bactofilins are known to form filaments, bundles of filaments, and lattice sheets in vitro and bundles of filaments have been observed in cells. Whether puncta or diffuse labeling represents different polymerized states or filaments vs. monomers has not been established. Microscopy shows mis-localization away from the stalk, but resolution is limited. Further experiments using higher resolution microscopy and TEM of purified protein would prove that the MTS is required for polymerization.

      We do not propose that the MTS is directly involved in the polymerization process, and preliminary transmission electron microscopy (TEM) data show that variants lacking the MTS or carrying amino acid exchanges in the MTS still form polymers when highly overproduced in E. coli and then purified from cell lysates by affinity chromatography. This finding is consistent with the results of previous studies and in line with the finding that bactofilin polymerization is exclusively mediated by the conserved bactofilin domain (Deng et al, Nat Microbiol, 2019). However, under native expression conditions, bactofilin levels are often relatively low, with only a few hundred molecules of BacA measured per cell in C. crescentus (Kühn et al, EMBO J, 2006). Our data indicate that, under this condition, the concentration of BacA on the 2D surface of the cytoplasmic membrane and, potentially, steric contraints induced by membrane curvature, may be required to facilitate its efficient assembly into functional polymeric complexes. We will provide TEM images of purified proteins in a revised version of our manuscript and explain this model in more detail in the Discussion.

      In the case of polymer-forming proteins, defined localized signals are typically interpreted as polymeric complexes. An even distribution of the fluorescence signals, by contrast, indicates that the proteins form monomers or, at most, small oligomers that diffuse rapidly within the cell and are thus no longer detected as a stationary focus by widefield microscopy. Our single-molecule data also indicate that proteins that are no longer able to interact with the membrane (as verified by cell fractionation studies and in vitro liposome binding assays) show a high diffusion rate, similar to that measured for the non-polymerizing and non-membrane-bound F130R variant. These results indicate that a loss of membrane binding strongly reduces the ability of BacA to form polymeric assemblies. To support this hypothesis, we will perform additional single-molecule tracking analyses of a freely diffusible and membrane-bound monomeric fluorescent proteins for comparison.

      (2) Liposome binding data would be strengthened with TEM images to show BacA binding to liposomes. From this experiment, gross polymerization structures of MTS variants could also be characterized.

      We do not have the possibility to perform cryo-electron microscopy studies of liposomes bound to BacA. However, the results of the cell fractionation and liposome sedimentation assays clearly support a critical role of the MTS in membrane binding.

      (3) The use of the BacA F130R mutant throughout the study to probe the effect of polymerization on membrane binding is concerning as there is no evidence showing that this variant cannot polymerize. Looking through the papers the authors referenced, there was no evidence of an identical mutation in BacA that was shown to be depolymerized or any discussion in this study of how the F130R mutation might to analogous to polymerization-deficient variants in other bactofilins mentioned in these references.

      Residue F130 in the C-terminal polymerization interface of BacA is highly conserved among bactofilin homologs, although its absolute position in the protein sequence may vary, depending on the length of the N-terminal unstructured tail. The papers cited in our manuscript show that an exchange of this conserved phenylalanine residue abolishes polymer formation. We will make this fact clearer in the revised version of the manuscript. Moreover, we will provide gel filtration and transmission electron microscopy data showing that the BacA-F130R variant no longer forms polymers.

      (4) Microscopy shows that a BacA variant lacking the native MTS regains the ability to form puncta, albeit mis-localized, in the cell when fused to a heterologous MTS from MreB. While this swap suggests a link between puncta formation and membrane binding the relationship between puncta and polymerization has not been established (see comment 1).

      We show that a BacA variant lacking the MTS regains the ability to form membrane-associated foci when fused to the MTS of MreB. In contrast, a similar variant that additionally carries the F130R exchange (preventing its polymerization) shows a diffuse cytoplasmic localization. In addition, we show that the F130R exchange leads to a loss of membrane binding and to a considerable increase in the mobility of the variants carrying the MreB MTS. Together, these results strongly support the hypothesis that membrane binding and polymerization act synergistically to establish localized bactofilin assemblies.

      (5) The authors provide no primary data for single molecule tracking. There is no tracking mapped onto microscopy images to show membrane localization or lack of localization in MTS deletion/ variants. A known soluble protein (e.g. unfused mVenus) and a known membrane bound protein would serve as valuable controls to interpret the data presented. It also is unclear why the authors chose to report molecular dynamics as mean squared displacement rather than mean squared displacement per unit time, and the number of localizations is not indicated. Extrapolating from the graph in figure 4 D for example, it looks like WT BacA-mVenus would have a mobility of 0.5 (0.02/0.04) micrometers squared per second which is approaching diffusive behavior. Further justification/details of their analysis method is needed. It's also not clear how one should interpret the finding that several of the double point mutants show higher displacement than deleting the entire MTS. These experiments as they stand don't account for any other cause of molecular behavior change and assume that a decrease in movement is synonymous with membrane binding.

      We agree that a more in-depth analysis of the single-molecule-tracking data would be helpful to support our conclusions.  We will map the reads on the cells, although the loss of membrane localization of BacA variants with a defective MTS is already obvious in the widefield fluorescence images. Moreover, we will perform additional measurements on soluble mVenus and a membrane-associated variant of mVenus for comparison and address the other issues raised here.

      The single-molecule tracking data alone are certainly not sufficient to draw firm conclusions on the relationship between membrane binding and protein mobility. However, our other in vivo and in vitro analyses indicate a very clear correlation of between the mobility of BacA and its ability to interact with the membrane and polymerize (processes that synergistically promote each other).

      (6) The experiments that map the interaction surface between the N-terminal unstructured region of PbpC and a specific part of the BacA bactofilin domain seem distinct from the main focus of the paper and the data somewhat preliminary. While the PbpC side has been probed by orthogonal approaches (mutation with localization in cells and affinity in vitro), the BacA region side has only been suggested by the deuterium exchange experiment and needs some kind of validation

      The results of the HDX analysis per se are not preliminary and clearly indicate a change in the accessibily of surface-exposed residues in the central bactofilin domain. However, we agree that additional experiments would be required to verify the binding site suggested by these data. However, this aspect is indeed not the main focus of the paper. We included the analysis of the interaction between PbpC and BacA, because we see effects of membrane binding/polymerization on the BacA-PbpC interaction and thus on the physiological function of BacA in C. crescentus.

      Reviewer #2:

      Summary:

      The authors of this study investigated the membrane-binding properties of bactofilin A from Caulobacter crescentus, a classic model organism for bacterial cell biology. BacA was the progenitor of a family of cytoskeletal proteins that have been identified as ubiquitous structural components in bacteria, performing a range of cell biological functions. Association with the cell membrane is a common property of the bactofilins studied and is thought to be important for functionality. However, almost all bactofilins lack a transmembrane domain. While membrane association has been attributed to the unstructured N-terminus, experimental evidence had yet to be provided. As a result, the mode of membrane association and the underlying molecular mechanics remained elusive.

      Liu at al. analyze the membrane binding properties of BacA in detail and scrutinize molecular interactions using in-vivo, in-vitro and in-silico techniques. They show that few N-terminal amino acids are important for membrane association or proper localization and suggest that membrane association promotes polymerization. Bioinformatic analyses revealed conserved lineage-specific N-terminal motifs indicating a conserved role in protein localization. Using HDX analysis they also identify a potential interaction site with PbpC, a morphogenic cell wall synthase implicated in Caulobacter stalk synthesis. Complementary, they pinpoint the bactofilin-interacting region within the PbpC C-terminus, known to interact with bactofilin. They further show that BacA localization is independent of PbpC.

      Strengths

      These data significantly advance the understanding of the membrane binding determinants of bactofilins and thus their function at the molecular level. The major strength of the comprehensive study is the combination of complementary in vivo, in vitro and bioinformatic/simulation approaches, the results of which are consistent.

      We thank Reviewer #2 for the positive evaluation of our paper and for the constructive criticism sent to us in the the non-public review. We will address the points raised in a revised version of the manuscript.

      Weaknesses:

      The results are limited to protein localization and interaction, as there is no data on phenotypic effects. Therefore, the cell biological significance remains somewhat underrepresented.

      We agree that it would be interesting to investigate the phenotypic effects caused by a defect of BacA in membrane binding. We will investigate PbpC localization and stalk length in phosphate-limited medium for mutants producing MTS-deficient BacA variants and include these data in the revised version of the manuscript. However, we would like to point out that the relevance of our findings goes beyond the C. cres­centus system, because the MTS and its role for bactofilin function is likely to be conserved in many other species.

    1. Author response:

      We thank the reviewers for their valuable comments. Our revision will address their recommendations and clarify any misconceptions. The main points we plan to amend are as follows:

      Direct comparison of pRF sizes

      We may have misunderstood this comment in the eLife assessment. We believe our original analyses and the figures already provided a “direct comparison between pRF sizes in the high-adapted and low-adapted conditions”. Specifically, we included a figure showing the histograms of pRF sizes in both conditions, and also reported statistical tests to compare conditions both within each participant and across the group. However, we now realize these comparisons might not be as clear to readers as we intended, which would explain Reviewer #2’s interpretations. To clarify, in our revised version we will instead show 2D plots comparing pRF sizes between conditions as suggested by Reviewer #2, and also show the pRF size plotted against eccentricity (rather than only the difference) as suggested by Reviewer #3.

      Data sharing 

      The behavioral data, fMRI data (where ethically permissible), stimulus-generation code, statistical analyses, and fMRI stimulus video are already publicly available at the link: https://osf.io/9kfgx/. However, we unfortunately failed to include the link in the preprint. We apologize for this oversight. It will be included in the revision. The repository now also contains a script for simulated adaptation effects on pRF size used in our response to Reviewer #2. Moreover, for transparency, we will include plots of all the pRF parameter maps for all participants, including pRF size, polar angle, eccentricity, normalized R2, and raw R2.

      Sample size

      The reviewers shared concerns about the sample size of our study. We disagree that this is a weakness of our study. It is important to note that large sample sizes are not necessary to obtain conclusive results, especially when the research aims to test whether an effect exists, rather than finding out how strong the effect is on average in a population (Schwarzkopf & Huang, 2024, currently out as preprint, but in press at Psychological Methods). Our results showed robust within-subject effects, consistent across multiple visual regions in most individual participants. A larger sample size would not necessarily improve the reliability of our findings. Treating each individual as an independent replication, our results suggest a high probability that they would replicate in each additional participant we could scan. 

      Reviewer #1:

      We thank the reviewer for their careful evaluation and positive comments. We will include a more detailed discussion about the issues pointed out, and an additional plot showing the polar angle for both adapter conditions. In line with previous work on the reliability of pRF estimates (van Dijk, de Haas, Moutsiana, & Schwarzkopf, 2016; Senden, Reithler, Gijsen, & Goebel, 2014), both polar angle and eccentricity maps are very stable between the two adaptation conditions.

      Reviewer #2:

      We thank the reviewer for their comments - we will improve how we report key findings which we hope will clarify matters raised by the reviewer.

      RF positions in a voxel

      The reviewer’s comments suggest that they may have misunderstood the diagram (Figure 1A) illustrating the theoretical basis of the adaptation effect, likely due to us inadvertently putting the small RFs in the middle of the illustration. We will change this figure to avoid such confusion.

      Theoretical explanation of adaptation effect

      The reviewer’s explanation for how adaptation should affect the size of pRF averaging across individual RFs is incorrect. When selecting RFs from a fixed range of semi-uniformly distributed positions (as in an fMRI voxel), the average position of RFs (corresponding to pRF position) is naturally near the center of this range. The average size (corresponding to pRF size) reflects the visual field coverage of these individual RFs. This aggregate visual field coverage thus also reflects the individual sizes. When large RFs have been adapted out, this means the visual field coverage at the boundaries is sparser, and the aggregate pRF is therefore smaller. The opposite happens when adapting out the contribution of small RFs. We demonstrate this with a simple simulation at this OSF link: https://osf.io/ebnky/.

      Figure S2 

      It is not actually possible to compare R2 between regions by looking at Figure S2 because it shows the pRF size change, not R2. Therefore, the arguments Reviewer #2 made based on their interpretation of the figure are not valid. Just as the reviewer expected, V1 is one of the brain regions with good pRF model fits. In our revision, we will include normalized and raw R2 maps to make this more obvious to the readers and provide additional explanations.

      V1 appeared essentially empty in that plot primarily due to the sigma threshold we selected, which was unintentionally more conservative than those applied in our analyses and other figures. We apologize for this mistake and will correct it in the revised version by including a plot with the appropriate sigma threshold.

      Thresholding details 

      Thresholding information was included in our original manuscript; however, we will include more information in the figure captions to make it more obvious.

      2D plots will replace histograms

      We thank the reviewer for this suggestion. The manuscript contained histograms showing the distribution of pRF size for both adaptation conditions for each participant and visual area (Figure S1). However, we agree that 2D plots better communicate the difference in pRF parameters between conditions, so we will replace this figure. We will consider 2D kernel density plots as suggested by the reviewer; however, such plots can obscure distributional anomalies so they may not be the optimal choice and we may opt to show transparent scatter plots of individual pRFs instead.

      (proportional) pRF size-change map 

      The reviewer requests pRF size difference maps. Figure S2 in fact demonstrates the proportional difference between the pRF sizes of the two adaptation conditions. Instead of simply taking the difference, we believe showing the proportional change map is more sensible because overall pRF size varies considerably between visual regions. We will explain this more clearly in our revision. 

      pRF eccentricity plot 

      “I suspect that the difference in PRF size across voxels correlates very strongly with the difference in eccentricity across voxels.”

      Our manuscript already contains a supplementary plot (Figure S4 B) comparing the eccentricity between adapter conditions, showing no notable shift in eccentricities except in V3A - but that is a small region and the results are generally more variable. We will comment more on this finding in the main text and explain this figure in more detail. 

      To the reviewer’s point, even if there were an appreciable shift in eccentricity between conditions (as they suggest may have happened for the example participant we showed), this does not mean that the pRF size effect is “due [...] to shifts in eccentricity.” Parameters in a complex multi-dimensional model like the pRF are not independent. There is no way of knowing whether a change in one parameter is causally linked with a change in another. We can only report the parameter estimates the model produces. 

      In fact, it is conceivable that adaptation causes both: changes in pRF size and eccentricity. If more central or peripheral RFs tend to have smaller or larger RFs, respectively, then adapting out one part of the distribution will shift the average accordingly. However, as we already established, we find no compelling evidence that pRF eccentricity changes dramatically due to adaptation, while pRF size does. We will illustrate this using the 2D plots in our revision.

      Reviewer #3:

      We thank the reviewer for their comments.

      pRF model

      Top-up adapters were not modelled in our analyses because they are shared events in all TRs, critically also including the “blank” periods, providing a constant source of signal. Therefore modelling them separately cannot meaningfully change the results. However, the reviewer makes a good suggestion that it would be useful to mention this in the manuscript, so we will add a discussion of this point.

      pRF size vs eccentricity

      We will add a plot showing pRF size in the two adaptation conditions (in addition to the pRF size difference) as a function of eccentricity.

      Correlation with behavioral effect

      In the original manuscript, we pointed out why the correlation between the magnitude of the behavioral effect and the pRF size change is not an appropriate test for our data. First, the reviewer is right that a larger sample size would be needed to reliably detect such a between-subject correlation. More importantly, as per our recruitment criteria for the fMRI experiment, we did not scan participants showing weak perceptual effects. This limits the variability in the perceptual effect and makes correlation inapplicable.

      References

      van Dijk, J. A., de Haas, B., Moutsiana, C., & Schwarzkopf, D. S. (2016). Intersession reliability of population receptive field estimates. NeuroImage, 143, 293–303. https://doi.org/10.1016/J.NEUROIMAGE.2016.09.013

      Schwarzkopf, D. S., & Huang, Z. (2024). A simple statistical framework for small sample studies. BioRxiv, 2023.09.19.558509. https://doi.org/10.1101/2023.09.19.558509

      Senden, M., Reithler, J., Gijsen, S., & Goebel, R. (2014). Evaluating population receptive field estimation frameworks in terms of robustness and reproducibility. PloS One, 9(12). https://doi.org/10.1371/JOURNAL.PONE.0114054

    1. Author response:

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We appreciate the Editorial assessment on our paper’s strengths and novelty.  We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning.  Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      We thank the Reviewers for their comments and suggestions, prompting new analyses and additions that strengthened our report.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      We have previously showed that neural replay of MEG activity representing the practiced skill correlated with micro-offline gains during rest intervals of early learning, 1 consistent with the recent report that hippocampal ripples during these offline periods predict human motor sequence learning2.  However, decoding accuracy in our earlier work1 needed improvement.  Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses: 

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions. 

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while head position was not monitored online for this study, the head was restrained using an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. Head movement did not exceed 5mm between the beginning and end of each scan for all participants included in the study. Fourth, we agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. However, in order for any such correlations to meaningfully impact decoding performance, such head movements would need to: (A) be consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) systematically vary between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is extremely unlikely.

      Given the task design, a much more likely confound in our estimation would be the contribution of eye movement artefacts to the decoder performance (an issue appropriately raised by Reviewer #3 in the comments below). Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may move their eyes in a way that is systematically related to the task.  Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (or keyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (Overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts).

      In fact, inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. A similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued.  The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals. 1,3-5  Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known.  Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported6-11, and appears to be even more prominent during early fine motor skill learning in the non-dominant hand12,13.  The frontal regions identified in these studies are known to play crucial roles in executive control14, motor planning15, and working memory6,8,16-18 processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations6,8,16-18, in addition to working memory19. Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task.  We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We strongly disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular. To clarify, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications. One could also view this hybrid-space decoding approach as a spatial analogue to common time-frequency based analyses such as theta-gamma phase amplitude coupling (PAC), which combine information from two or more narrow-band spectral features derived from the same time-series data.

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (HybridAlt) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (HybridOrig). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± SD 7.03% for HybridOrig vs. 75.49% ± SD 7.17% for HybridAlt; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04) (Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. HybridAlt: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. HybridOrig:  Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that HybridOrig (the approach used in our manuscript) significantly outperforms the HybridAlt approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns.

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen. 

      We definitely agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated. This has been well documented in the MEG literature20,21 and is a particularly important confound to address in functional or effective connectivity analyses (not performed in the present study). In the present analysis, any correlation between adjacent voxels presents a multi-collinearity problem, which effectively reduces the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. - the effective dimensionality is still greater than 1), the intra-parcel spatial patterns could still meaningfully contribute to the decoder performance. Two specific results support this assertion.

      First, we obtained higher decoding accuracy with voxel-space features [74.51% (± SD 7.34%)] compared to parcel space features [68.77% (± SD 7.6%)] (Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel-space features.  Second, Individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding supports the Reviewer’s assertion that neighboring voxels express similar information, but also shows that the correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside in.

      Author response image 3.

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding.

       

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment. 

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics22,23 muscle activation patterns24 and temporal sequencing25 during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).  

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions". 

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these substantial shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans performing a similar sequence learning task showed that flexibility in brain network composition (i.e. – changes in brain region members displaying coordinated activity) is up-regulated in novel learning environments and explains differences in learning rates across individuals26.  This work supports our interpretation of the present study data, that brain networks engaged in sequential motor skills rapidly reconfigure during early learning.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning27,28. For example, reactivation events in the posterior parietal29 and medial prefrontal30,31 cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains32, including motor sequence learning1,33,34.  Further, synchronized interactions between MPFC and hippocampus are more prominent during early learning as opposed to later stages27,35,36, perhaps reflecting “redistribution of hippocampal memories to MPFC” 27.  MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning37. Consistently, coupling between hippocampus and MPFC has been shown during, and importantly immediately following (rest) initial memory encoding38,39.  Importantly, MPFC activity during initial memory encoding predicts subsequent recall40. Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” 28, also engaged in the development of an abstract representation of the sequence41.  In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” 42-44 required during early learning42-44. The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice45, all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding46,47.  Thus, several prefrontal and frontoparietal regions contributing to long term learning 48 are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning.  We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here. 

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power and neural replay density during inter-practice rest periods) to observed micro-offline gains49.

      Reviewer #2 (Public review): 

      Summary 

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond. <br /> Strengths 

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea. 

      Weaknesses 

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation. The issue can essentially be framed as a mixing problem. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Moreover, if the representation distance is largely driven by this mixing effect, it’s also possible that the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      We also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Overall, we do strongly agree with the Reviewer that the naturalistic, self-paced, generative task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the keyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study. 

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide some insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans.  This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider these specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study.  We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself. 

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the keyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses.  We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the keyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder.  Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the keyDown event (t0 = 0 ms).  We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window.  Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study.  Ongoing work in our lab, as pointed out above, is investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well. 

      The Reviewer suggests that the current data is not convincing enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last IndexOP5 and first IndexOP1 from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Author response image 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest period.

      Author response image 4.

      Distribution of individual subject correlation coefficients between contextualization changes occurring during practice or rest with  micro-online and micro-offline performance gains. Note that, the correlation distributions were significantly higher for the relationship between contextualization changes during rest and micro-offline gains than for contextualization changes during practice and either micro-online or offline gain.

      With respect to the second concern highlighted above, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the reviewed manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out.   When quantifying online changes in contextualization from the first IndexOP1 the last IndexOP5 keypress in the same trial we observed no learning-related trend (Author response image 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Author response image 6).

      Author response image 5.

      Trial by trial trend of offline (left panel) and online (middle and right panels) changes in contextualization. Offline changes in contextualization were assessed by calculating the distance between neural representations for the last IndexOP5 keypress in the previous trial and the first IndexOP1 keypress in the present trial. Two different approaches were used to characterize online contextualization changes. The analysis included in the reviewed manuscript (middle panel) calculated the distance between IndexOP1 and IndexOP5 for each correct sequence, which was then averaged across the trial. This approach is limited by the lack of control for the passage of time when making online versus offline comparisons. Thus, the second approach controlled for the passage of time by calculating distance between the representations associated with the first IndexOP1 keypress and the last IndexOP5 keypress within the same trial. Note that while the first approach showed an increase online contextualization trend with practice, the second approach did not.

      Author response image 6.

      Relationship between online contextualization and online learning is shown for both within-sequence (left; note that this is the online contextualization measure used in the reviewd manuscript) and across-sequence (right) distance calculation. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals. 

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. <br /> Strengths: 

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter). 

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?). 

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.  

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.  

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. –  3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space.  We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption. 

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions50. In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above and agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above replay to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would not address our experimental question: “do neural representations of the same action performed at different locations within a skill sequence contextually differentiate or remain stable as learning evolves”.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023). 

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial (which is pre-planned offline) is performed in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes.  The Reviewer is particularly concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. However, in contrast to the Reviewers stated argument above, findings from Korneysheva et. al (2019) showed that neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence.  Thus, mixing effects are likely still present for the first keypress in a trial. Also note that we now present new control analyses in multiple responses above confirming that hypothetical mixing effects between adjacent keypresses do not explain our reported contextualization finding. A statement addressing these possibilities raised by the Reviewer has been added to the Discussion in the revised manuscript.

      In relation to pre-planning, ongoing MEG work in our lab is investigating contextualization within different time windows tailored specifically for assessing how sequence skill action planning evolves with learning.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice).  It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable. 

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualization effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts in general on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement-related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. Notably, the minimal participant engagement with the visual task display observed in this study highlights an important difference between behavior observed during explicit sequence learning motor tasks (which is highly generative in nature) with reactive responses to stimulus cues in a serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when comparing findings across studies. All elements pertaining to this new control analysis are now included in the revised manuscript.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"? 

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differention” vs micro-online gains, (2) “online differention” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Author response images 4, 5 and 6 above). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      This statement is incorrect. The original Bonstrup et al (2019) 49 paper clearly states that micro-offline gains must be carefully interpreted based upon the behavioral context within which they are observed, and lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning.  In fact, the excellent meta-analysis of Pan & Rickard (2015) 51, which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study49, as well as in all our subsequent work. Pan & Rickard stated:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943). It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks52,53. Rickard, Cai, Rieth, Jones, and Ard (2008) and Brawn, Fenn, Nusbaum, and Margoliash (2010) 52,53 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008) massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard51 made several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They stated:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead 51. One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead 51. That design appears sufficient to eliminate at least the majority of the reactive inhibition effect 52,53.”

      We mindfully incorporated recommendations from Pan and Rickard51  into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects. 

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.”  The initial Bönstrup et al. (2019) 49 report was followed up by a large online crowd-sourcing study (Bönstrup et al., 2020) 54. This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 7 below for further details on these conditions).

      Author response image 7.

      Micro-offline gains observed in learning and non-learning contexts are attributed to different underlying causes. (A) Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from Bönstrup et al. (2019) 49. During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also 54). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature 55-57, argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning.  The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds.

      Evidence documented in that paper54 showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118);  3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) 54.  Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve Pan and Rickard51 refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects1. Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study1) linked to micro-offline gains during early skill learning. 33 These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice58. Third, even more recently, Chen et al. (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple events (which are known markers for neural replay59) in the hippocampus (80-120 Hz in humans) with micro-offline gains during early skill learning. The authors report that the strong increase in ripple rates tracked learning behavior, both across blocks and across participants. The authors conclude that hippocampal ripples during resting offline periods contribute to motor sequence learning. 2

      Thus, there is actually now substantial evidence in the literature directly supporting the assertion “that micro-offline gains really result from offline learning”.  On the contrary, according to Gupta & Rickard (2024) “…the mechanism underlying RI [reactive inhibition] is not well established” after over 80 years of investigation60, possibly due to the fact that “reactive inhibition” is a categorical description of behavioral effects that likely result from several heterogenous processes with very different underlying mechanisms.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). 

      It is important to point out that the recent work of Gupta & Rickard (2022,2024) 55 does not present any data that directly opposes our finding that early skill learning49 is expressed as micro-offline gains during rest breaks. These studies are essentially an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.  To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning. Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods. Again, we reported the same finding for trials following the early learning period in our original Bönstrup et al. (2019) paper49 (Author response image 7). Also, please note that we reported in this paper that cumulative micro-offline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later49 (see the Results section and further elaboration in the Discussion). Thus, while the composition of our data is supportive of a short-term memory consolidation process operating over several seconds during early learning, it likely differs from those involved over longer training times and offline periods, as assessed by Gupta & Rickard (2022).

      In the recent preprint from Das et al (2024) 61,  the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data.   The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”.  The study utilizes a spaced vs. massed practice group between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis. Crucially, the design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning1,33,49,54,57,58,62.  A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 8):

      Author response image 8.

      (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original Bönstrup et al. (2019) 49 paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report 49  (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) 49 is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range.

      First, participants in the original Bönstrup et al. study 49 experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 8).  Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.  

      Second, and perhaps most importantly, the actual intervention (i.e. – the difference in practice schedule between the Spaced and Massed groups) employed by Das et al. covers a very small fraction of the overall training session. Identical practice schedule segments for both the Spaced & Massed groups are indicated by the red shaded area in Author response image 8. Please note that these identical segments cover 94.84% of the Massed group training schedule and 88.01% of the Spaced group training schedule (since it has 60 seconds of additional rest). This means that the actual interventions cover less than 5% (for Massed) and 12% (for Spaced) of the total training session, which minimizes any chance of observing a difference between groups.

      Also note that the very beginning of the practice schedule (during which Figure R9 shows substantial learning is known to occur) is labeled in the Das et al. study as Test 1.  Test 1 encompasses the first 20 seconds of practice (alternatively viewed as the first two 10-second-long practice trials with no inter-practice rest). This is immediately followed by the Training 1 intervention, which is composed of only three 10-second-long practice trials (with 10-second inter-practice rest for the Spaced group and no inter-practice rest for the Massed group). Author response image 8 also shows that since there is no inter-practice rest after the third Training practice trial for the Spaced group, this third trial (for both Training 1 and 2) is actually a part of an identical practice schedule segment shared by both groups (Massed and Spaced), reducing the magnitude of the intervention even further.

      Moreover, we know from the original Bönstrup et al. (2019) paper49 that 46.57% of all overall group-level performance gains occurred between trials 2 and 5 for that study. Thus, Das et al. are limiting their designed intervention to a period covering less than half of the early learning range discussed in the literature, which again, minimizes any chance of observing an effect.

      This issue is amplified even further at Training 2 since skill learning prior to the long 5-minute break is retained, further constraining the performance range over these three trials. A related issue pertains to the trials labeled as Test 1 (trials 1-2) and Test 2 (trials 6-7) by Das et al. Again, we know from the original Bönstrup et al. paper 49 that 18.06% and 14.43% (32.49% total) of all overall group-level performance gains occurred during trials corresponding to Das et al Test 1 and Test 2, respectively. In other words, Das et al averaged skill performance over 20 seconds of practice at two time-points where dramatic skill improvements occur. Pan & Rickard (1995) previously showed that such averaging is known to inject artefacts into analyses of performance gains.

      Furthermore, the structure of the Test in Das et. al study appears to have an interference effect on the Spaced group performance after the training intervention.  This makes sense if you consider that the Spaced group is required to now perform the task in a Massed practice environment (i.e., two 10-second-long practice trials merged into one long trial), further blurring the true intervention effects. This effect is observable in Figure 1C,E of their pre-print. Specifically, while the Massed group continues to show an increase in performance during test relative to the last 10 seconds of practice during training, the Spaced group displays a marked decrease. This decrease is in stark contrast to the monotonic increases observed for both groups at all other time-points.

      Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (as opposed to after it has been removed) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized49. Extrapolation of this current framework to post-plateau performance periods, longer timespans, or non-learning situations (e.g. – the Non-repeating groups from Experiments 3 & 4 in Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      References

      (1) Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M. & Cohen, L. G. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 35, 109193 (2021). https://doi.org:10.1016/j.celrep.2021.109193

      (2) Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H. & Staresina, B. P. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680 (2024). https://doi.org:10.1101/2024.10.06.614680

      (3) Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol 79, 1117-1123 (1998).

      (4) Karni, A. et al. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature 377, 155-158 (1995). https://doi.org:10.1038/377155a0

      (5) Kleim, J. A., Barbay, S. & Nudo, R. J. Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol 80, 3321-3325 (1998).

      (6) Shadmehr, R. & Holcomb, H. H. Neural correlates of motor memory consolidation. Science 277, 821-824 (1997).

      (7) Doyon, J. et al. Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A 99, 1017-1022 (2002).

      (8) Toni, I., Ramnani, N., Josephs, O., Ashburner, J. & Passingham, R. E. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage 14, 1048-1057 (2001).

      (9) Grafton, S. T. et al. Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci 12, 2542-2548 (1992).

      (10) Kennerley, S. W., Sakai, K. & Rushworth, M. F. Organization of action sequences and the role of the pre-SMA. J Neurophysiol 91, 978-993 (2004). https://doi.org:10.1152/jn.00651.2003 00651.2003 [pii]

      (11) Hardwick, R. M., Rottschy, C., Miall, R. C. & Eickhoff, S. B. A quantitative meta-analysis and review of motor learning in the human brain. Neuroimage 67, 283-297 (2013). https://doi.org:10.1016/j.neuroimage.2012.11.020

      (12) Sawamura, D. et al. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep 9, 20397 (2019). https://doi.org:10.1038/s41598-019-56956-0

      (13) Lee, S. H., Jin, S. H. & An, J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep 9, 14066 (2019). https://doi.org:10.1038/s41598-019-50644-9

      (14) Battaglia-Mayer, A. & Caminiti, R. Corticocortical Systems Underlying High-Order Motor Control. J Neurosci 39, 4404-4421 (2019). https://doi.org:10.1523/JNEUROSCI.2094-18.2019

      (15) Toni, I., Thoenissen, D. & Zilles, K. Movement preparation and motor intention. Neuroimage 14, S110-117 (2001). https://doi.org:10.1006/nimg.2001.0841

      (16) Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci 1, 529-533 (1998). https://doi.org:10.1038/2245

      (17) Andersen, R. A. & Buneo, C. A. Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25, 189-220 (2002). https://doi.org:10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      (18) Buneo, C. A. & Andersen, R. A. The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia 44, 2594-2606 (2006). https://doi.org:S0028-3932(05)00333-7 [pii] 10.1016/j.neuropsychologia.2005.10.011

      (19) Grover, S., Wen, W., Viswanathan, V., Gill, C. T. & Reinhart, R. M. G. Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci 25, 1237-1246 (2022). https://doi.org:10.1038/s41593-022-01132-3

      (20) Colclough, G. L. et al. How reliable are MEG resting-state connectivity metrics? Neuroimage 138, 284-293 (2016). https://doi.org:10.1016/j.neuroimage.2016.05.070

      (21) Colclough, G. L., Brookes, M. J., Smith, S. M. & Woolrich, M. W. A symmetric multivariate leakage correction for MEG connectomes. NeuroImage 117, 439-448 (2015). https://doi.org:10.1016/j.neuroimage.2015.03.071

      (22) Mollazadeh, M. et al. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci 31, 15531-15543 (2011). https://doi.org:10.1523/JNEUROSCI.2999-11.2011

      (23) Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W. & Donoghue, J. P. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol 105, 1603-1619 (2011). https://doi.org:10.1152/jn.00532.2010

      (24) Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J Neurophysiol 108, 18-24 (2012). https://doi.org:10.1152/jn.00832.2011

      (25) Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51-56 (2012). https://doi.org:10.1038/nature11129

      (26) Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A 108, 7641-7646 (2011). https://doi.org:10.1073/pnas.1018985108

      (27) Albouy, G., King, B. R., Maquet, P. & Doyon, J. Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus 23, 985-1004 (2013). https://doi.org:10.1002/hipo.22183

      (28) Albouy, G. et al. Neural correlates of performance variability during motor sequence acquisition. Neuroimage 60, 324-331 (2012). https://doi.org:10.1016/j.neuroimage.2011.12.049

      (29) Qin, Y. L., McNaughton, B. L., Skaggs, W. E. & Barnes, C. A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci 352, 1525-1533 (1997). https://doi.org:10.1098/rstb.1997.0139

      (30) Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150 (2007). https://doi.org:10.1126/science.1148979

      (31) Molle, M. & Born, J. Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron 61, 496-498 (2009). https://doi.org:S0896-6273(09)00122-6 [pii] 10.1016/j.neuron.2009.02.002

      (32) Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat Rev Neurosci 6, 119-130 (2005). https://doi.org:10.1038/nrn1607

      (33) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A 117, 23898-23903 (2020). https://doi.org:10.1073/pnas.2009576117

      (34) Albouy, G. et al. Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage 108, 423-434 (2015). https://doi.org:10.1016/j.neuroimage.2014.12.049

      (35) Gais, S. et al. Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A 104, 18778-18783 (2007). https://doi.org:0705454104 [pii] 10.1073/pnas.0705454104

      (36) Sterpenich, V. et al. Sleep promotes the neural reorganization of remote emotional memory. J Neurosci 29, 5143-5152 (2009). https://doi.org:10.1523/JNEUROSCI.0561-09.2009

      (37) Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057-1070 (2012). https://doi.org:10.1016/j.neuron.2012.12.002

      (38) van Kesteren, M. T., Fernandez, G., Norris, D. G. & Hermans, E. J. Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A 107, 7550-7555 (2010). https://doi.org:10.1073/pnas.0914892107

      (39) van Kesteren, M. T., Ruiter, D. J., Fernandez, G. & Henson, R. N. How schema and novelty augment memory formation. Trends Neurosci 35, 211-219 (2012). https://doi.org:10.1016/j.tins.2012.02.001

      (40) Wagner, A. D. et al. Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science (New York, N.Y.) 281, 1188-1191 (1998).

      (41) Ashe, J., Lungu, O. V., Basford, A. T. & Lu, X. Cortical control of motor sequences. Curr Opin Neurobiol 16, 213-221 (2006).

      (42) Hikosaka, O., Nakamura, K., Sakai, K. & Nakahara, H. Central mechanisms of motor skill learning. Curr Opin Neurobiol 12, 217-222 (2002).

      (43) Penhune, V. B. & Steele, C. J. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res. 226, 579-591 (2012). https://doi.org:10.1016/j.bbr.2011.09.044

      (44) Doyon, J. et al. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural brain research 199, 61-75 (2009). https://doi.org:10.1016/j.bbr.2008.11.012

      (45) Schendan, H. E., Searl, M. M., Melrose, R. J. & Stern, C. E. An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37, 1013-1025 (2003). https://doi.org:10.1016/s0896-6273(03)00123-5

      (46) Morris, R. G. M. Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23, 2829-2846 (2006). https://doi.org:10.1111/j.1460-9568.2006.04888.x

      (47) Tse, D. et al. Schemas and memory consolidation. Science 316, 76-82 (2007). https://doi.org:10.1126/science.1135935

      (48) Berlot, E., Popp, N. J. & Diedrichsen, J. A critical re-evaluation of fMRI signatures of motor sequence learning. Elife 9 (2020). https://doi.org:10.7554/eLife.55241

      (49) Bonstrup, M. et al. A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol 29, 1346-1351 e1344 (2019). https://doi.org:10.1016/j.cub.2019.02.049

      (50) Kornysheva, K. et al. Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron 101, 1166-1180 e1163 (2019). https://doi.org:10.1016/j.neuron.2019.01.018

      (51) Pan, S. C. & Rickard, T. C. Sleep and motor learning: Is there room for consolidation? Psychol Bull 141, 812-834 (2015). https://doi.org:10.1037/bul0000009

      (52) Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J. & Ard, M. C. Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn 34, 834-842 (2008). https://doi.org:10.1037/0278-7393.34.4.834

      53) Brawn, T. P., Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci 30, 13977-13982 (2010). https://doi.org:10.1523/JNEUROSCI.3295-10.2010

      (54) Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N. & Cohen, L. G. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn 5, 7 (2020). https://doi.org:10.1038/s41539-020-0066-9

      (55) Gupta, M. W. & Rickard, T. C. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn 7, 25 (2022). https://doi.org:10.1038/s41539-022-00140-z

      (56) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proceedings of the National Academy of Sciences 117, 23898-23903 (2020).

      (57) Brooks, E., Wallis, S., Hendrikse, J. & Coxon, J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn 9, 23 (2024). https://doi.org:10.1038/s41539-024-00238-6

      (58) Deleglise, A. et al. Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex 33, 6120-6131 (2023). https://doi.org:10.1093/cercor/bhac489

      (59) Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073-1188 (2015). https://doi.org:10.1002/hipo.22488

      (60) Gupta, M. W. & Rickard, T. C. Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep 14, 4661 (2024). https://doi.org:10.1038/s41598-024-52726-9

      (61) Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P. & Azanon, E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795 (2024). https://doi.org:10.1101/2024.07.11.602795

      (62) Mylonas, D. et al. Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci 44 (2024). https://doi.org:10.1523/JNEUROSCI.1839-23.2024

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for helping us improve our article and software. The feedback that we received was very helpful and constructive, and we hope that the changes that we have made are indeed effective at making the software more accessible, the manuscript clearer, and the online documentation more insightful as well. A number of comments related to shared concerns, such as:

      • the need to describe various processing steps more clearly (e.g. particle picking, or the nature of ‘dust’ in segmentations)

      • describing the features of Ais more clearly, and explaining how it can interface with existing tools that are commonly used in cryoET

      • a degree of subjectivity in the discussion of results (e.g. about Pix2pix performing better than other networks in some cases.)

      We have now addressed these important points, with a focus on streamlining not only the workflow within Ais but also making interfacing between Ais and other tools easier. For instance, we explain more clearly which file types Ais uses and we have added the option to export .star files for use in, e.g., Relion, or meshes instead of coordinate lists. We also include information in the manuscript about how the particle picking process is implemented, and how false positives (‘dust’) can be avoided. Finally, all reviewers commented on our notion that Pix2pix can work ‘better’ despite reaching a higher loss after training. As suggested, we included a brief discussion about this idea in the supplementary information (Fig. S6) and used it to illustrate how Ais enables iteratively improving segmentation results. 

      Since receiving the reviews we have also made a number of other changes to the software that are not discussed below but that we nonetheless hope have made the software more reliable and easier to use. These include expanding the available settings, slight changes to the image processing that can help speed it up or avoid artefacts in some cases, improving the GUI-free usability of Ais, and incorporating various tools that should help make it easier to use Ais with remote data (e.g. doing annotation on an office PC, but model training on a more powerful remote PC). We have also been in contact with a number of users of the software, who reported issues or suggested various other miscellaneous improvements, and many of whom had found the software via the reviewed preprint.

      Reviewer 1 (Public Review):

      This paper describes "Ais", a new software tool for machine-learning-based segmentation and particle picking of electron tomograms. The software can visualise tomograms as slices and allows manual annotation for the training of a provided set of various types of neural networks. New networks can be added, provided they adhere to a Python file with an (undescribed) format. Once networks have been trained on manually annotated tomograms, they can be used to segment new tomograms within the same software. The authors also set up an online repository to which users can upload their models, so they might be re-used by others with similar needs. By logically combining the results from different types of segmentations, they further improve the detection of distinct features. The authors demonstrate the usefulness of their software on various data sets. Thus, the software appears to be a valuable tool for the cryo-ET community that will lower the boundaries of using a variety of machine-learning methods to help interpret tomograms. 

      We thank the reviewer for their kind feedback and for taking the time to review our article. On the basis of their  comments, we have made a number of changes to the software, article, and documentation, that we think have helped improve the project and render it more accessible (especially for interfacing with different tools, e.g. the suggestions to describe the file formats in more detail). We respond to all individual comments one-by-one below.

      Recommendations:

      I would consider raising the level of evidence that this program is useful to *convincing* if the authors would adequately address the suggestions for improvement below.

      (1) It would be helpful to describe the format of the Python files that are used to import networks, possibly in a supplement to the paper. 

      We have now included this information in both the online documentation and as a supplementary note (Supplementary Note 1). 

      (2) Likewise, it would be helpful to describe the format in which particle coordinates are produced. How can they be used in subsequent sub-tomogram averaging pipelines? Are segmentations saved as MRC volumes? Or could they be saved as triangulations as well? More implementation details like this would be good to have in the paper, so readers don't have to go into the code to investigate. 

      Coordinates: previously, we only exported arrays of coordinates as tab-separated .txt files, compatible with e.g. EMAN2. We now added a selection menu where users can specify whether to export either .star files or tsv .txt files, which together we think should cover most software suites for subtomogram averaging. 

      Triangulations: We have now improved the functionality for exporting triangulations. In the particle picking menu, there is now the option to output either coordinates or meshes (as .obj files). This was previously possible in the Rendering tab, but with the inclusion in the picking menu exporting triangulations can now be done for all tomograms at once rather than manually one by one.

      Edits in the text: the output formats were previously not clear in the text. We have now included this information in the introduction:

      “[…] To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      (3) In Table 2, pix2pix has much higher losses than alternatives, yet the text states it achieves fewer false negatives and fewer false positives. An explanation is needed as to why that is. Also, it is mentioned that a higher number of epochs may have improved the results. Then why wasn't this attempted? 

      The architecture of Pix2pix is quite different from that of the other networks included in the test. Whereas all others are trained to minimize a binary cross entropy (BCE) loss, Pix2pix uses a composite loss function that is a weighted combination of the generator loss and a discriminator penalty, neither of which employ BCE. However, to be able to compare loss values, we do compute a BCE loss value for the Pix2pix generator after every training epoch. This is the value reported in the manuscript and in the software. Although Pix2pix’ BCE loss does indeed diminish during training, the model is not actually optimized to minimize this particular value and a comparison by BCE loss is therefore not entirely fair to Pix2pix. This is pointed out (in brief) in the legend to the able: 

      “Unlike the other architectures, Pix2pix is not trained to minimize the bce loss but uses a different loss function instead. The bce loss values shown here were computed after training and may not be entirely comparable.”

      Regarding the extra number of epochs for Pix2pix: here, we initially ran in to the problem that the number of samples in the training data was low for the number of parameters in Pix2pix, leading to divergence later during training. This problem did not occur for most other models, so we decided to keep the data for the discussion around Table 1 and Figure 2 limited to that initial training dataset. After that, we increased the sample size (from 58 to 170 positive samples) and trained the model for longer. The resulting model was used in the subsequent analyses. This was previously implicit in the text but is now mentioned explicitly and in a new supplementary figure. 

      “For the antibody platform, the model that would be expected to be one of the worst based on the loss values, Pix2pix, actually generates segmentations that are seem well-suited for the downstream processing tasks. It also output fewer false positive segmentations for sections of membranes than many other models, including the lowest-loss model UNet. Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. We thus decided to use Pix2pix for the segmentation of antibody platforms, and increased the size of the antibody platform training dataset (from 58 to 170 positive samples) to train a much improved second iteration of the network for use in the following analyses (Fig. S6).”

      (4) It is not so clear what absorb and emit mean in the text about model interactions. A few explanatory sentences would be useful here. 

      We have expanded this paragraph to include some more detail.

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.”

      (5) Under Figure 4, the main text states "the model interactions described above", but because multiple interactions were described it is not clear which ones they were. Better to just specify again. 

      Changed as follows:

      “The antibody platform and antibody-C1 complex models were then applied to the respective datasets, in combination with the membrane and carbon models and the model interactions described above (Fig. 4b): the membrane avoiding carbon, and the antibody platforms colocalizing with the resulting membranes”.

      (6) The next paragraph mentions a "batch particle picking process to determine lists of particle coordinates", but the algorithm for how coordinates are obtained from segmented volumes is not described. 

      We have added a paragraph to the main text to describe the picking process:

      “This picking step comprises a number of processing steps (Fig. S7). First, the segmented (.mrc) volumes are thresholded at a user-specified level. Second, a distance transform of the resulting binary volume is computed, in which every nonzero pixel in the binary volume is assigned a new value, equal to the distance of that pixel to the nearest zero-valued pixel in the mask. Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded. Fifth, groups are assigned a weight value, equal to the sum of the prediction value (i.e. the corresponding pixel value in the input .mrc volume) of the pixels in the group. For every group found within close proximity to another group (using a user-specified value for the minimum particle spacing), the group with the lower weight value is discarded. Finally, the centroid coordinate of the grouped pixels is considered the final particle coordinate, and the list of all

      coordinates is saved in a tab-separated text file.

      “As an alternative output format, segmentations can also be converted to and saved as triangulated meshes, which can then be used for, e.g., membrane-guided particle picking. After picking particles, the resulting coordinates are immediately available for inspection in the Ais 3D renderer (Fig. S8).“

      The two supplementary figures are pasted below for convenience. Fig. S7 is new, while Fig. S8 was previously Fig. S10 -the reference to this figure was originally missing in the main text, but is now included.

      (7) In the Methods section, it is stated that no validation splits are used "in order to make full use of an input set". This sounds like an odd decision, given the importance of validation sets in the training of many neural networks. Then how is overfitting monitored or prevented? This sounds like a major limitation of the method. 

      In our experience, the best way of preparing a suitable model is to (iteratively) annotate a set of training images and visually inspect the result. Since the manual annotation step is the bottleneck in this process, we decided not to use validation split in order to make full use of an annotated training dataset (i.e. a validation split of 20% would mean that 20% of the manually annotated training data is not used for training)

      We do recognize the importance of using separate data for validation, or at least offering the possibility of doing so. We have now added a parameter to the settings (and made a Settings menu item available in the top menu bar) where users can specify what fraction (0, 10, 20, or 50%) of training datasets should be set aside for validation. If the chosen value is not 0%, the software reports the validation loss as well as the size of the split during training, rather than (as was done previously) the training loss. We have, however, set the default value for the validation split to 0%, for the same reason as before. We also added a section to the online documentation about using validation splits, and edited the corresponding paragraph in the methods section:

      “The reported loss is that calculated on the training dataset itself, i.e., no validation split was applied. During regular use of the software, users can specify whether to use a validation split or not. By default, a validation split is not applied, in order to make full use of an input set of ground truth annotations. Depending on the chosen split size, the software reports either the overall training loss or the validation loss during training.”

      (8) Related to this point: how is the training of the models in the software modelled? It might be helpful to add a paragraph to the paper in which this process is described, together with indicators of what to look out for when training a model, e.g. when should one stop training? 

      We have expanded the paragraph where we write about the utility of comparing different networks architectures to also include a note on how Ais facilitates monitoring the output of a model during training:

      “When taking the training and processing speeds in to account as well as the segmentation results, there is no overall best architecture. We therefore included multiple well-performing model architectures in the final library, in order to allow users to select from these models to find one that works well for their specific datasets. Although it is not necessary to screen different network architectures and users may simply opt to use the default (VGGNet), these results thus show that it can be useful to test different networks in order to identify one that is best. Moreover, these results also highlight the utility of preparing well-performing models by iteratively improving training datasets and re-training models in a streamlined interface. To aid in this process, the software displays the loss value of a network during training and allows for the application of models to datasets during training. Thus, users can inspect how a model’s output changes during training and decide whether to interrupt training and improve the training data or choose a different architecture.”

      (9) Figure 1 legend: define the colours of the different segmentations. 

      Done

      (10) It may be better to colour Figure 2B with the same colours as Figure 2A. 

      We tried this, but the effect is that the underlying density is much harder to see. We think the current grayscale image paired with the various segmentations underneath is better for visually identifying which density corresponds to membranes, carbon film, or antibody platforms.

      Reviewer 2 (Public Review):

      Summary: 

      Last et al. present Ais, a new deep learning-based software package for the segmentation of cryo-electron tomography data sets. The distinguishing factor of this package is its orientation to the joint use of different models, rather than the implementation of a given approach. Notably, the software is supported by an online repository of segmentation models, open to contributions from the community. 

      The usefulness of handling different models in one single environment is showcased with a comparative study on how different models perform on a given data set; then with an explanation of how the results of several models can be manually merged by the interactive tools inside Ais. 

      The manuscripts present two applications of Ais on real data sets; one is oriented to showcase its particlepicking capacities on a study previously completed by the authors; the second one refers to a complex segmentation problem on two different data sets (representing different geometries as bacterial cilia and mitochondria in a mouse neuron), both from public databases. 

      The software described in the paper is compactly documented on its website, additionally providing links to some YouTube videos (less than an hour in total) where the authors videocapture and comment on major workflows. 

      In short, the manuscript describes a valuable resource for the community of tomography practitioners. 

      Strengths: 

      A public repository of segmentation models; easiness of working with several models and comparing/merging the results. 

      Weaknesses: 

      A certain lack of concretion when describing the overall features of the software that differentiate it from others. 

      We thank the reviewer for their kind and constructive feedback. Following the suggestion to use the Pix2pix results to illustrate the utility of Ais for analyzing results, we have added a new supplementary figure (Fig. S6) and brief discussion, showing the use of Ais in iteratively improving segmentation results. We have also expanded the online documentation and included a note in the supplementary information about how models are saved/loaded (Supplemetary note 1) 

      Recommendations:

      I would like to ask the authors about some concerns about the Ais project as a whole: 

      (1) The website that accompanies the paper (aiscryoet.org), albeit functional, seems to be in its first steps. Is it planned to extend it? In particular, one of the major contributions of the paper (the maintenance of an open repository of models) could use better documentation describing the expected formats to submit models. This could even be discussed in the supplementary material of the manuscript, as this feature is possibly the most distinctive one of the paper. Engaging third-party users would require giving them an easier entry point, and the superficial mention of this aspect in the online documentation could be much more generous.

      We have added a new page to the online documentation, titled ‘Sharing models’ where we include an explanation of the structure of model files and demonstrate the upload page. We also added a note to the Supplementary Information that explains the file format for models, and how they are loaded/saved (i.e., that these standard keras model obects). 

      To make it easier to interface Ais with other tools, we have now also made some of the core functionality available (e.g. training models, batch segmentation) via the command line interface. Information on how to use this is included in the online documentation. All file formats are common formats used in cryoET, so that using Ais in a workflow with, e.g. AreTomo -> Ais -> Relion should now be more straightforward.

      (2) A different major line advanced by the authors to underpin the novelty of the software, is its claimed flexibility and modularity. In particular, the restrictions of other packages in terms of visualization and user interaction are mentioned. Although in the manuscript it is also mentioned that most of the functionalities in Ais are already available in major established packages, as a reader I am left confused about what exactly makes the offer of Ais different from others in terms of operation and interaction: is it just the two aspects developed in the manuscript (possibility of using different models and tools to operate model interaction)? If so, it should probably be stated; but if the authors want to pinpoint other aspects of the capacity of Ais to drive smoothly the interactions, they should be listed and described, instead of leaving it as an unspecific comment. As a potential user of Ais, I would suggest the authors add (maybe in the supplementary material) a listing of such features. Figure 1 does indeed carry the name "overview of (...) functionalities", but it is not clear to me which functionalities I can expect to be absent or differently solved on the other tools they mention.

      We have rewritten the part of the introduction where we previously listed the features as below. We think it should now be clearer for the reader to know what features to expect, as well as how Ais can interface with other software (i.e. what the inputs and outputs are). We have also edited the caption for Figure 1 to make it explicit that panels A to C represent the annotation, model preparation, and rendering steps of the Ais workflow and that the images are screenshots from the software.

      “In this report we present Ais, an open-source tool that is designed to enable any cryoET user – whether experienced with software and segmentation or a novice – to quickly and accurately segment their cryoET data in a streamlined and largely automated fashion. Ais comprises a comprehensive and accessible user interface within which all steps of segmentation can be performed, including: the annotation of tomograms and compiling datasets for the training of convolutional neural networks (CNNs), training and monitoring performance of CNNs for automated segmentation, 3D visualization of segmentations, and exporting particle coordinates or meshes for use in downstream processes. To help generate accurate segmentations, the software contains a library of various neural network architectures and implements a system of configurable interactions between different models. Overall, the software thus aims to enable a streamlined workflow where users can interactively test, improve, and employ CNNs for automated segmentation. To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      “Figure 1 – an overview of the user interface and functionalities. The various panels represent sequential stages in the Ais processing workflow, including annotation (a), testing CNNs (b), visualizing segmentation (c). These images (a-c) are unedited screenshots of the software. a) […]”

      (3) Table 1 could have the names of the three last columns. The table has enough empty space in the other columns to accommodate this. 

      Done.

      (4) The comment about Pix2pix needing a larger number of training epochs (being a larger model than the other ones considered) is interesting. It also lends itself for the authors to illustrate the ability of their software to precisely do this: allow the users to flexibly analyze results and test hypothesis

      Please see the response to Reviewer 1 comment #3. We agree that this is a useful example of the ability to iterate between annotation and training, and have added an explicit mention of this in the text:

      “Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. In a second iteration of annotation and training, we thus increased the size of the antibody platform training dataset (from 58 to 170 positive samples) and generated an improved Pix2pix model for use in the following analyses.”

      Reviewer 3 (Public Review):

      We appreciate the reviewer’s extensive and very helpful feedback and are glad to read that they consider Ais potentially quite useful for the users. To address the reviewer’s comments, we have made various edits to the text, figures, and documentation, that we think have helped improve the clarity of our work. We list all edits below. 

      Summary

      In this manuscript, Last and colleagues describe Ais, an open-source software package for the semi-automated segmentation of cryo-electron tomography (cryo-ET) maps. Specifically, Ais provides a graphical user interface (GUI) for the manual segmentation and annotation of specific features of interest. These manual annotations are then used as input ground-truth data for training a convolutional neural network (CNN) model, which can then be used for automatic segmentation. Ais provides the option of several CNNs so that users can compare their performance on their structures of interest in order to determine the CNN that best suits their needs. Additionally, pre-trained models can be uploaded and shared to an online database. 

      Algorithms are also provided to characterize "model interactions" which allows users to define heuristic rules on how the different segmentations interact. For instance, a membrane-adjacent protein can have rules where it must colocalize a certain distance away from a membrane segmentation. Such rules can help reduce false positives; as in the case above, false negatives predicted away from membranes are eliminated. 

      The authors then show how Ais can be used for particle picking and subsequent subtomogram averaging and for the segmentation of cellular tomograms for visual analysis. For subtomogram averaging, they used a previously published dataset and compared the averages of their automated picking with the published manual picking. Analysis of cellular tomogram segmentation was primarily visual. 

      Strengths:

      CNN-based segmentation of cryo-ET data is a rapidly developing area of research, as it promises substantially faster results than manual segmentation as well as the possibility for higher accuracy. However, this field is still very much in the development and the overall performance of these approaches, even across different algorithms, still leaves much to be desired. In this context, I think Ais is an interesting package, as it aims to provide both new and experienced users with streamlined approaches for manual annotation, access to a number of CNNs, and methods to refine the outputs of CNN models against each other. I think this can be quite useful for users, particularly as these methods develop. 

      Weaknesses: 

      Whilst overall I am enthusiastic about this manuscript, I still have a number of comments: 

      (1) On page 5, paragraph 1, there is a discussion on human judgement of these results. I think a more detailed discussion is required here, as from looking at the figures, I don't know that I agree with the authors' statement that Pix2pix is better. I acknowledge that this is extremely subjective, which is the problem. I think that a manual segmentation should also be shown in a figure so that the reader has a better way to gauge the performance of the automated segmentation.

      Please see the answer to Reviewer 1’s comment #3.

      (2) On page 7, the authors mention terms such as "emit" and "absorb" but never properly define them, such that I feel like I'm guessing at their meaning. Precise definitions of these terms should be provided. 

      We have expanded this paragraph to include some more detail:

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.” 

      (3) For Figure 3, it's unclear if the parent models shown (particularly the carbon model) are binary or not.

      The figure looks to be grey values, which would imply that it's the visualization of some prediction score. If so, how is this thresholded? This can also be made clearer in the text. 

      The figures show the grayscale output of the parent model, but this grayscale output is thresholded to produce a binary mask that is used in an interaction. We have edited the text to include a mention of thresholding at a user-specified threshold value:

      “These interactions are implemented as follows: first, a binary mask is generated by thresholding the parent model’s predictions using a user-specified threshold value. Next, the mask is then dilated using a circular kernel with a radius 𝑅, a parameter that we call the interaction radius. Finally, the child model’s prediction values are multiplied with this mask.”

      To avoid confusion, we have also edited the figure to show the binary masks rather than the grayscale segmentations. 

      (4) Figure 3D was produced in ChimeraX using the hide dust function. I think some discussion on the nature of this "dust" is in order, e.g. how much is there and how large does it need to be to be considered dust? Given that these segmentations can be used for particle picking, this seems like it may be a major contributor to false positives. 

      ‘Dust’ in segmentations is essentially unavoidable; it would require a perfect model that does not produce any false positives. However, when models are sufficiently accurate, the volume of false positives is typically smaller than that of the structures that were intended to be segmented. In these cases, discarding particles based on size is a practical way of filtering the segmentation results. Since it is difficult to generalize when to consider something ‘dust’ we decided to include this additional text in the Method’s section rather than in the main text:

      “… with the use of the ‘hide dust’ function (the same settings were used for each panel, different settings used for each feature).

      This ‘dust’ corresponds to small (in comparison to the segmented structures of interest) volumes of false positive segmentations, which are present in the data due to imperfections in the used models. The rate and volume of false positives can be reduced either by improving the models (typically by including more examples of the images of what would be false negatives or positives in the training data) or, if the dust particles are indeed smaller than the structures of interest, they can simply be discarded by filtering particles based on their volume, as applied here. In particle picking a ‘minimum particle volume’ is specified – particles with a smaller volume are considered ‘dust’.

      In combination with the newly included text about the method of converting volumes into lists of coordinates (see Reviewer 1’s comment #6).

      “Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded…”

      We think it should now be clearer that (some form of) discarding ‘dust’ is a step that is typically included in the particle picking process.

      (5) Page 9 contains the following sentence: "After selecting these values, we then launched a batch particle picking process to determine lists of particle coordinates based on the segmented volumes." Given how important this is, I feel like this requires significant description, e.g. how are densities thresholded, how are centers determined, and what if there are overlapping segmentations? 

      Please see the response to Reviewer 1’s comment #6.

      (6) The FSC shown in Figure S6 for the auto-picked maps is concerning. First, a horizontal line at FSC = 0 should be added. It seems that starting at a frequency of ~0.045, the FSC of the autopicked map increases above zero and stays there. Since this is not present in the FSC of the manually picked averages, this suggests the automatic approach is also finding some sort of consistent features. This needs to be discussed. 

      Thank you for pointing this out. Awkwardly, this was due to a mistake made while formatting the figure. In the two separate original plots, the Y axes had slightly different ranges, but this was missed when they were combined to prepare the joint supplementary figure. As a result, the FSC values for the autopicked half maps are displayed incorrectly. The original separate plots are shown below to illustrate the discrepancy:

      Author response image 1.

      The corrected figure is Figure S9 in the manuscript. The values of 44 Å and 46 Å were not determined from the graph and remain unchanged.

      (7) Page 11 contains the statement "the segmented volumes found no immediately apparent false positive predictions of these pores". This is quite subjective and I don't know that I agree with this assessment. Unless the authors decide to quantify this through subtomogram classification, I don't think this statement is appropriate. 

      We originally included this statement and the supplementary figure because we wanted to show another example of automated picking, this time in the more crowded environment of the cell. We do agree that it requires better substantiation, but also think that the demonstration of automated picking of the antibody platforms and IgG3-C1 complexes for subtomogram averaging suffices to demonstrate Ais’ picking capabilities. Since the supplementary information includes an example of picked coordinates rendered in the Ais 3D viewer (Figure S7) that also used the pore dataset, we still include the supplementary figure (S10) but have edited the statement to read:

      “Moreover, we could identify the molecular pores within the DMV, and pick sets of particles that might be suitable for use in subtomogram averaging (see Fig. S11).”

      We have also expanded the text that accompanies the supplementary figure to emphasize that results from automated picking are likely to require further curation, e.g. by classification in subtomogram averaging, and that the selection of particles is highly dependent on the thresholds used in the conversion from volumes to lists of coordinates.

      (8) In the methods, the authors note that particle picking is explained in detail in the online documentation. Given that this is a key feature of this software, such an explanation should be in the manuscript. 

      Please see the response to Reviewer 1’s comment #6. 

      Recommendations:

      (9) The word "model" seems to be used quite ambiguously. Sometimes it seems to refer to the manual segmentations, the CNN architectures, the trained models, or the output predictions. More precision in this language would greatly improve the readability of the manuscript.

      This was indeed quite ambiguous, especially in the introduction. We have edited the text to be clearer on these differences. The word ‘model’ is now only used to refer to trained CNNs that segment a particular feature (as in ‘membrane model’ or ‘model interactions’). Where we used terms such as ‘3D models’ to describe scenes rendered in 3D, we now use ‘3D visualizations’ or similar terms. Where we previously used the term ‘models’ to refer to CNN architectures, we now use terms such as ‘neural network architectures’ or ‘architecture’. Some examples:

      … with which one can automatically segment the same or any other dataset …

      Moreover, since Pix2pix is a relatively large network, …       

      … to generate a 3D visualization of ten distinct cellular …

      … with the use of the same training datasets for all network architectures …

      In Figure 1, the text in panels D and E is illegible. 

      We have edited the figure to show the text more clearly (the previous images were unedited screenshots of the website).

      (10) Prior to the section on model interactions, I was under the impression that all annotations were performed simultaneously. I think it could be clarified that models are generated per annotation type. 

      Multiple different features can be annotated (i.e. drawn by hand by the user) at the same time, but each trained CNN only segments one feature. CNNs that output segmentations for multiple features can be implemented straightforwardly, but this introduces the need to provide training data where for every grayscale image, every feature is annotated. This can make preparing the training data much more cumbersome. Reusability of the models is also hampered. We now mention the separateness of the networks explicitly in the introduction:

      “Multiple features, such as membranes, microtubules, ribosomes, and phosphate crystals, can be segmented and edited at the same time across multiple datasets (even hundreds). These annotations are then extracted and used as ground truth labels upon which to condition multiple separate neural networks, …”

      (11) On page 6, there is the text "some features are assigned a high segmentation value by multiple of the networks, leading to ambiguity in the results". Do they mean some false features? 

      To avoid ambiguity of the word ‘features’, we have edited the sentence to read:

      “… some parts of the image are assigned a high segmentation value by multiple of the networks, leading to false classifications and ambiguity in the results.”

      (12) Figures 2 and 3 would be easier to follow if they had consistent coloring. 

      We have changed the colouring in Figure 2 to match that of Figure 3 better:

      (13) For Figure 3D, I'm confused as to why the authors showed results from the tomogram in Figure 2B. It seems like the tomogram in Figure 3C would be a more obvious choice, as we would be able to see how the 2D slices look in 3D. This would also make it easier to see the effect of interactions on false negatives. Also, since the orientation of the tomogram in 2B is quite different than that shown in 3D, it's a bit difficult to relate the two.

      We chose to show this dataset because it exemplifies the effects of both model competition and model interactions better than the tomogram in Figure 3C. See Figure 3D and Author response image 2 for a comparison:

      Author response image 2.

      (14) I'm confused as to why the tomographic data shown in Figures 4D, E, and F are black on white while all other cryo-ET data is shown as white on black. 

      The images in Figure 4DEF are now inverted.

      (15) For Figure 5, there needs to be better visual cueing to emphasize which tomographic slices are related to the segmentations in Panels A and B. 

      We have edited the figure to show more clearly which grayscale image corresponds to which segmentation:

      (16) I don't understand what I should be taking away from Figures S1 and S2. There are a lot of boxes around membrane areas and I don't know what these boxes mean. 

      We have added a more descriptive text to these figures. The boxes are placed by the user to select areas of the image that will be sampled when saving training datasets.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #4

      We sincerely appreciate the time and effort you have taken to review our manuscript. We followed your recommendations to polish the text and make it easier to understand.

      Regarding terms and terminology, we changed “non-breeding” everywhere in the text to “over- wintering.”

      Regarding the title, as it was suggested by reviewer #1 as his recommendation, we tried to find a compromise and make the changes you suggested but left part of the suggestion from reviewer #1. So, now it’s “Foxtrot migration and dynamic over-wintering range of an arctic raptor”

      Thank you for highlighting the importance of snow cover and changes in snow cover as a possible factor of over-wintering movements. We appreciate your feedback and have explored several approaches to address this issue. Specifically, we examined how both snow cover extent and changes in snow cover influenced movement distance. However, we found no effect of either factor on movement distance.

      Our data show that birds leave their sites in October and move southwest, even though snow cover is minimal at that time. They also leave their sites in November and in subsequent months, regardless of the snow cover levels. Thus, we observed no pattern of birds leaving sites when snow cover reaches a specific threshold (e.g., 75-80%). Similarly, we found no evidence of birds staying in areas with a certain snow cover extent (e.g., 30%), nor did they leave sites when snow cover increased by a specific amount (e.g., by 10 or 20%).

      It is possible that more experienced birds anticipate that October plots will become inaccessible later in the winter and, therefore, leave early without waiting for significant snow accumulation. Alternatively, other factors, such as brief heavy snowfalls, may trigger movement, even if these do not lead to sustained increases in snow cover. Multiple factors, possibly acting asynchronously, could also play a role. This complexity adds an interesting dimension to the study of ecological patterns. However, in this study, we chose to focus on describing the migration pattern itself and its impact on aspects like over-winter range determination and population dynamics. While we have prioritized this approach, we remain committed to further analyzing the data to uncover additional details about this behavior.

      In response to your suggestion, we have expanded the Methods sections to clarify that we tested the effects of snow cover and changes in snow cover on distance (Lines 241-246); the Results section (Lines 348-349). We have also included the relevant plots in the Supplementary Materials. In the Discussion, we noted that this approach did not reveal any significant dependence and acknowledged that this issue requires further investigation (Lines 422-459).

      ---------

      The following is the authors’ response to the previous reviews.

      Reviewer #2:

      We sincerely appreciate the time and effort you have taken to review our manuscript. 

      First of all, we apologize for publishing the preprint without incorporating certain adjustments outlined in our earlier response, particularly in the Methods section. This was due to an oversight regarding the different versions of the manuscript. We have corrected this mistake. Our response to the feedback on this section (Methods), with line numbers of the changes made, is immediately below this response. In addition, we have included the units of measurement (mean and standard deviation) in both the results and figure captions for clarity.

      To focus on the main point regarding wintering strategies, we acknowledge that in the previous versions, this aspect was inadequately addressed and caused some confusion. In the revised edition, both the Introduction and the Discussion have been thoroughly reworked.

      As you suggested, we have removed the long introductory paragraph and all references to foxtrot migrations from the Introduction. As a result, the Introduction is now short and to the point. In the second paragraph, we explain why we propose the wintering strategies outlined (L74-81).

      In the Discussion, we've added a substantial new section at the beginning that discusses different wintering strategies. We have also updated Figure 4 accordingly. Previously, we erroneously suggested that Montagu's harrier and other African-Palaearctic migrants might adopt wintering strategies similar to those we describe. Upon further investigation, however, we found that almost all African-Palaearctic migrants exhibit an itinerant wintering strategy. Conversely, the strategy we describe is primarily observed in mid-latitude wintering species.

      We have shown that, unlike itinerancy, the birds in our study don't pause for 1-2 months at multiple non-breeding sites, but instead migrate significant distances, up to 1000 km, throughout the winter. Furthermore, unlike itinerancy, the sites they reach are consistently snow-free throughout the year. Following the logic of publications on Montagu's harriers (Schlaich et al. 2023), our birds do not wait for favorable conditions at the next site, as is typical of itinerancy. Moreover, this behavior is influenced by external factors such as snow cover dynamics and occurs primarily in mid-latitudes. Researchers studying a species similar to our subject, the Common buzzard, observed a similar pattern and termed it "prolonged autumn migration" rather than itinerancy. Although their transmitters stopped working in mid-winter, precluding a full observation of the annual cycle, they captured the essence of continued migration at a slower pace, distinct from itinerancy. We've detailed all of these findings in a new section.

      In addition, we acknowledge the mischaracterization of the implications of our research as ‘Conservation implications’ and have corrected this to ‘Mapping ranges and assessing population trends’, as you suggested.

      Finally, we've rewritten the Conclusion, removing overly grandiose statements and simply summarizing the main findings.

      We appreciate your time and effort in reviewing our manuscript. With your invaluable input, it has become clearer, more concise, and easier to understand.

      Dataset: unclear what is the frequency of GPS transmissions. Furthermore, information on relative tag mass for the tracked individuals should be reported.

      We have included this information in our manuscript (L 115-122). We also refer to the study in which this dataset was first used and described in detail (L 123).

      Data pre-processing: more details are needed here. What data have been removed if the bird died? The entire track of the individual? Only the data classified in the last section of the track? The section also reports on an 'iterative procedure' for annotating tracks, which is only vaguely described. A piecewise regression is mentioned, but no details are provided, not even on what is the dependent variable (I assume it should be latitude?).

      Regarding the deaths, we only removed the data when the bird was already dead. We estimated the date of death and excluded tracking data corresponding to the period after the bird's death. We have corrected the text to make this clear (L 130-131).

      Regarding the piecewise regression. We have added a detailed description on lines 136-148.

      Data analysis: several potential issues here:

      (1) Unclear why sex was not included in all mixed models. I think it should be included.

      Our dataset contains 35 females and eight males (L116). This ratio does not allow us to include sex in all models and adequately assess the influence of this factor. At the same time, because adult females disperse farther than males in some raptor species, we conducted a separate analysis of the dependence of migration distance on sex (Table S8) and found no evidence for this in our species. We have written about that in the Methods (L177-181) and after in the Results (L277-278).

      (2) Unclear what is the rationale of describing habitat use during migration; is it only to show that it is a largely unsuitable habitat for the species? But is a formal analysis required then? Wouldn't be enough to simply describe this?

      Habitat use and snow cover determine the two main phases (quick and slow) of the pattern we describe. We believe that habitat analysis is appropriate in this case, and a simple description would be uninformative and not support our conclusions.

      (3) Analysis of snow cover: such a 'what if' analysis is fine but it seems to be a rather indirect assessment of the effect of snow cover on movement patterns. Can a more direct test be envisaged relating e.g. daily movement patterns to concomitant snow cover? This should be rather straightforward. The effectiveness of this method rests on among-year differences in snow cover and timing of snowfall. A further possibility would be to demonstrate habitat selection within the entire non-breeding home range of an individual in relation snow cover. Such an analysis would imply associating presenceabsence of snow to every location within the non-breeding range and testing whether the proportion of locations with snow is lower than the proportion of snow of random locations within the entire nonbreeding home range (95% KDE) for every individual (e.g. by setting a 1/10 ratio presence to random locations).

      The proposed analysis will provide an opportunity to assess whether the Rough-legged buzzard selects areas with the lowest snow cover, but will not provide an opportunity to follow the dynamics and will therefore give a misleading overall picture. This is especially true in the spring months. In March-April, Rough-legged buzzards move northeast and are in an area that is not the most open to snow. At this time, areas to the southwest are more open to snow (this can be seen in Figure 3b). If we perform the proposed analysis, the control points for this period would be both to the north (where there is more snow) and to the south (where there is less snow) from the real locations, and the result would be that there is no difference in snow cover. 

      A step-selection analysis could be used, as we did in our previous work (Curk et al 2020 Sci Rep) with the same Rough-legged buzzards (but during migration, not winter). But this would only give us a qualitative idea, not a quantitative one - that Rough-legged Buzzards move from snow (in the fall) and follow snowmelt progression (in the spring). 

      At the same time, our analysis gives a complete picture of snow cover dynamics in different parts of the non-breeding range. This allows us to see that if Rough-legged buzzards remained at their fall migration endpoint without moving southwest, they would encounter 14.4% more snow cover (99.5% vs. 85.1%). Although this difference may seem small (14.4%), it holds significance for rodent-hunting birds, distinguishing between complete and patchy snow cover.

      Simultaneously, if Rough-legged buzzards immediately flew to the southwest and stayed there throughout winter, they would experience 25.7% less snow cover (57.3% vs. 31.6%). Despite a greater difference than in the first case, it doesn't compel them to adopt this strategy, as it represents the difference between various degrees of landscape openness from snow cover.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In an era of increasing antibiotic resistance, there is a pressing need for the development of novel sustainable therapies to tackle problematic pathogens. In this study, the authors hypothesize that pyoverdines - metal-chelating compounds produced by fluorescent pseudomonads - can act as antibacterials by locking away iron, thereby arresting pathogen growth. Using biochemical, growth, and virulence assays on 12 opportunistic pathogens strains, the authors demonstrate that pyoverdines induce iron starvation, but this effect was highly context-dependent. This same effect has been demonstrated for plant pathogens, but not for human opportunistic pathogens exposed to natural siderophores. Only those pathogens lacking (1) a matching receptor to take up pyoverdine-bound iron and/or (2) the ability to produce strong iron chelators themselves experienced strong growth arrest. This would suggest that pyoverdines might not be effective against all pathogens, thereby potentially limiting the utility of pyoverdines as global antibacterials.

      Strengths:

      The work addresses an important and timely question - can pyoverdines be used as an alternative strategy to deal with opportunistic pathogens? In general, the work is well conducted with rigorous biochemical, growth, and virulence assays. The work is clearly written and the findings are supported by high-quality figures.

      Weaknesses:

      I do not think there are any 'weaknesses' as such. However, it is well known that siderophore production is highly plastic, typically being upregulated in response to metal limitation (as well as toxic metal stress). Did the authors quantify whether pyoverdine supplementation altered siderophore production in the focal pathogens (either through phenotypic assays / transcriptomics)? Could such a phenotypic plastic response result in an increased capacity to scavenge iron from the environment? Importantly, increased expression of siderophores has been shown to enhance pathogen virulence (e.g. Lear et al 2023: increased pyoverdine production is linked with increased virulence in Pseudomonas aeruginosa). I really appreciate the amount of work the authors have put into this study, but I would suggest expanding the discussion a bit to include a few sentences on

      (1) unintentional consequences of pyoverdine treatment (e.g. changes in gene expression and non-siderophore-related mutations (e.g. biofilm formation)) on disease dynamics/pathogen virulence:

      (2) the efficacy of siderophore treatment under more natural conditions, i.e. when the pathogens have to compete with other species in the resident community (i.e. any other effects than resistance evolution through HGT of pyoverdine receptors as mentioned).

      Response 1: We would like to thank reviewer # 1 for the positive and constructive assessment. We agree that discussing the above points is important. We have added new paragraphs in the discussion, in which we elaborate on unintentional consequences (lines 532-551) and HGT of receptors (lines 599-607).

      Reviewer #1 (Recommendations For The Authors):

      I only have minor comments/suggestions for the authors, all listed below:

      • The authors' findings show that the antibacterial activity of pyoverdine is highly context-dependent. As such, I would suggest somewhat toning down the quite general statement in the Abstract: 'Thus, pyoverdines from environmental strain could become new sustainable antibacterials against human pathogens'

      Response 2: We agree that the pyoverdine treatment is especially potent against Acinetobacter baumannii and Staphylococcus aureus, but less so against Klebsiella pneumoniae. The treatment success is pathogen-dependent, and we have thus modified the phrase in the abstract (lines 32-34). The new sentence now reads: 'Thus, pyoverdines from environmental strains have the potential to become a new class of sustainable antibacterials against specific human pathogens.' Also in other parts of the manuscript (Results and Discussion), we emphasize that the pyoverdine treatment will likely be effective against specific pathogens (e.g., those with lower-iron affinity siderophores).

      • Bacteria often produce more than one type of siderophore. Do you know whether the 320 natural isolates used in this study produce any non-pyoverdine siderophores? Previous work has shown that pyochelin production is suppressed in PAO1 under a wider range of lab conditions. Do you know whether this is the case for the natural isolates used here (and rule out a potential role of non-pyoverdines in iron starvation as observed in Figure 1).

      Response 3: This is a valid question. Our own bioinformatic and phenotypic assays reveal that a certain fraction of strains (~ 40%) can produce secondary siderophores (unpublished data). We now mention the existence of secondary siderophores on lines 97-100 and 123. However, we do not think that their contribution to the supernatant assay results is large since the expression of pyoverdine typically suppresses the expression of the secondary siderophores (Cornelis 2010 Appl Microbiol Biotechnol; Dumas et al. 2013 Proc B) under stringent iron limitation. Furthermore, secondary siderophores have lower iron-binding affinities than pyoverdine. Finally, both the semi-pure and ultra-pure pyoverdine extracts showed strong pathogen inhibition (Fig. 3), and we are thus confident that pyoverdine is responsible for the observed growth inhibition.

      • Upon first mentioning the 'mock control' in the Results section in the main text, please state what the actual treatment is.

      Response 4: Thank you for noticing this. We now explain in more detail the actual treatment conditions used on lines 103-107 and in the caption of Figure 1. We have further removed the term 'mock' as it is confusing in this context and simple refer to the 'control treatment' in the text.

      • Please mention what the different colours mean in the legend of growth recovery in Figure 1B

      Response 5: We have clarified the colour scheme in the legend of Figure 1B.

      • Please clarify whether you used 12 or 14 strains of human pathogens (the latter number is mentioned in the results section)?

      Response 6: In the methods (lines 647-650), we now clearly specify that we used 12 strains of human pathogens in the initial supernatant screen (Figure 1). For all subsequent analyses (dose-response curves and infection experiments), we included the ESKAPE pathogens K. pneumoniae and A. baumannii.

      • Please explain whether ferribactin can be used in any other way than iron chelation (e.g. can this precursor be recycled to form pyoverdine)?

      Response 7: We apologize for not having properly explained the role of ferribactin. Under natural conditions, ferribactin is not secreted. It is kept in the periplasmic space, where it matures to pyoverdine. We most likely recovered ferribactin in the supernatant because of the vigorous shaking and centrifugation involved in the pyoverdine purification protocol. We now explain this on lines 216-218. Thus, there is no ferribactin secretion and recycling.

      • Have the authors looked at whether there is a relationship between the degree of growth arrest and phylogenetic distance? Would you expect there to be one?

      Response 8: This is an interesting question. We have now constructed a phylogenetic tree to explore this relationship (new Figure S2). We found that strains with inhibitory supernatants were scattered across the phylogenetic tree (described on lines 129-135). However, we also found two branches on the tree on which strains with inhibitory supernatant effects were overrepresented. This matches well our previous analysis that closely related species can produce similar pyoverdine types, but that the same pyoverdine can also be produced by completely different species (Gu et al. 2024 eLife).

      • In the Methods section, please mention you used pyoverdine-only controls in the infection assay.

      Response 9: We now mention the use of pyoverdine-only controls in the Methods section (lines 788-790). Overall, we have improved the infection procedure section (starting on line 770). Thank you for pointing this out.

      • Did you confirm whether the addition of pyoverdine resulted in lower bacterial loads in Galleria? In other words, were the observed changes in mortality solely related to changes in bacterial density?

      Response 10: Thank you for this valid question. No, we did not test whether pyoverdine treatment reduces the bacterial load. However, we did this in the past in two studies with a similar set of pathogens (Weigert et al. 2017 Evol Appl; Schmitz et al. 2023 Proc B) and found strong correlations between G. mellonella survival and bacterial loads. We agree that it is important to understand how pyoverdine affects pathogen load in the host and we will address this point in future studies.

      • In your infection assay, were Galleria (n=10) for each treatment housed in the same environment/container? If so, can you treat these as independent observations or should you use some sort of grouping variable in your survival analysis?

      Response 11: Thank you for pointing this out. We forgot to clarify this in the Methods section and now do so on lines 777-779. All larvae were individually housed in separate wells of a 24-well plate. There was no physical contact between larvae and no opportunity for pathogen exchange. As such, we treat each individual larvae as an independent observation.

      Reviewer #2 (Public Review):

      In this work, Vollenweider et al. examine the effectiveness of using natural products, specifically molecules that chelate iron, to treat infectious agents. Through the purification of 320 environmental isolates, 25 potential candidates were identified from natural products based on inhibition assays and were further screened. The structural information and chemical composition were determined.

      The paper is well-structured and thorough; targeting virulence factors in this manner is a great idea. My enthusiasm is dampened by the mediocre effects of the compounds. The lack of a dose-response curve in the survivability assays suggests a limited scope for these molecules. While it is encouraging that the best survivability occurred at the lowest toxicity level, it opens questions as to how effective such molecules can be. Either the reduction in mortality was offset by using higher concentrations, which was not observed in the compound-alone test, or there is no dose-response curve. The latter would suggest to me that the variation in survivability is not due to the addition of siderophores.

      Response 12: Thank you very much for the overall positive assessment. We understand your concern regarding the effectiveness of pyoverdines in the host. However, we wish to emphasize that hazard risks were reduced by more than 50% when treating A. baumannii and K. pneumoniae. Moreover, it was not so surprising to us that the treatment worked best at intermediate pyoverdine concentrations. We anticipated that pyoverdines could have negative effects for the host at relatively high concentrations because siderophore can interfere with host iron stocks (see discussion starting on line 552). Finally, dose-response curves do not necessarily need to be linear or sigmoid, they can also be hump-shaped. To better illustrate this aspect, we have now plotted the time to death for all the deceased larvae against the pyoverdine concentration gradient and fitted polynomial regression (new Fig. S6). For the above two pathogens, we found humped-shaped dose-response curves in four out of the six comparisons. We present this new analysis on lines 351-362.

      I would also like to see how these molecules compare to other iron-chelating molecules. Desferoxamine is a bacteria-derived siderophore that is FDA-approved. However, it is not used to treat infections. Would the author consider comparing their candidate molecules to well-studied molecules? This also raises questions about the novelty of this work; I think the authors could rephrase the discussion to better reflect that bioprospecting for iron-chelating molecules has previously occurred and been successful.

      Response 13: Thank you for the comment. The initial version of our manuscript already featured a brief discussion on other iron-chelation therapies. We have now changed the narrative to better reflect the differences of our approach to already existing iron-chelating molecules such as deferoxamine (lines 608-632).

      Finally, I am concerned about the few mutations reported in the resistance study. Looking at the SI, it appears that very few mutations were seen. It is unclear what filtering the authors used to arrive at such a low number of mutations. Even filtering against mutations that were selected by adaptation to the media, it seems low that only a handful of clones had distinct mutations.

      Response 14: We apologise for the unclear explanations and data analysis. When reanalysing the data we indeed detected a mistake: we originally treated all genomes as clonal origin, despite the fact that we sequenced entire populations for the control treatments. We have now completely re-done the mutational analysis using the breseq pipeline as newly described in the Methods (lines 861-866) and presented in the Results (lines 421-451). We have improved the filtering process and indeed found many more mutations, including the loss of mobile genetic elements. However, it is important to note that it is not uncommon to only find a few beneficial mutations. Especially, in cases where there are selective sweeps often only a few mutations fix.

      This paper has a lot of strengths. The workflow is logical and well-executed; the only significant weakness is the effect of the molecules and the lack of an explanation for a dose-response curve in the survivability assay, especially when compared to the data reported in Figure 3. As the authors describe in lines 214-217.

      Response 15: Thank you for this overall positive assessment. As discussed in our response 12, the effect of the molecule in the host was not weak as it decreased hazard risks by more than 50% for A. baumannii and K. pneumoniae. Moreover, we explain that the benefit of the pyoverdine treatment (in terms of treating the infection) can be offset by adverse effects on the host, especially at high pyoverdine concentrations.

      Reviewer #2 (Recommendations For The Authors):

      • Compare these compounds to well-studied iron chelating molecules.

      Response 16: We have addressed this comment in our response 13.

      • Considering adding time of death to the analysis for the survivability. While the reduction in mortality was not large perhaps the time to death increased.

      Response 17: This is an excellent suggestion. We have now analysed the time-to-death as a function of pyoverdine concentration (new Figure S6). Time-to-death was highly variable and sample size was fairly low for A. baumannii and K. pneumoniae as many larvae survived. Nonetheless, we found hump-shaped dose-response curves in four out of six comparisons and a linear dose-response curve in one case. We now report the new analyses on lines 351-362. Finally, we like to stress once more that reduction in mortality was considerable (hazard risk reduction by more than 50%).

      • I would also like to see the actual growth curves of the pathogens in the SI to accompany Fig 6.

      Response 18: This is a good point. We have now included the actual growth curves of the pathogens in the Supporting Information to accompany Figure 6 (new Figures S9 and S10).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review:

      Summary:

      This study presents a strategy to efficiently isolate PcrV-specific BCRs from human donors with cystic fibrosis who have/had Pseudomonas aeruginosa (PA) infection. Isolation of mAbs that provide protection against PA may be a key to developing a new strategy to treat PA infection as the PA has intrinsic and acquired resistance to most antibiotic drug classes. Hale et al. developed fluorescently labeled antigen-hook and isolated mAbs with anti-PA activity. Overall, the authors' conclusion is supported by solid data analysis presented in the paper. Four of five recombinantly expressed PcrV-specific mAbs exhibited anti-PA activity in a murine pneumonia challenge model as potent as the V2L2MD mAb (equivalent to gremubamab). However, therapeutic potency for these isolated mAbs is uncertain as the gremubamab has failed in Phase 2 trials. Clarification of this point would greatly benefit this paper.

      Strengths:

      (1) High efficiency of isolating antigen-specific BCRs using an antigenic hook.

      (2) The authors' conclusion is supported by data.

      Weaknesses:

      Although the authors state that the goal of this study was to generate novel protective mAbs for therapeutic use (P12; Para. 2), it is unclear whether PcrV-specific mAbs isolated in this study have therapeutic potential better than the gremubamab, which has failed in Phase 2 trials. Four of five PcrV-specific mAbs isolated in this study reduced bacterial burdens in mice as potent as, but not superior to, gremubamab-equivalent mAb. Clarification of this concern by revising the text or providing experimental results that show better potential than gremubamab would greatly benefit this paper.

      The authors thank the reviewer for their thoughtful positive assessment. As noted by the reviewer, the studies described here, which were performed in mice, show that our MBC-derived mAbs are as effective as V2L2MD, a mAb that is one component of the gremubamab bi-specific. However, key theoretical strengths of MBC-derived mAbs (reduced immunogenicity, full participation in effector functions) are not easily tested in mice. We have clarified and expanded our discussion of these points in our revised manuscript, particularly in the Discussion paragraph 4.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Page 8. Using improved methods that enhanced the efficiency and depth of sequencing (manuscript in preparation...). This method is not provided in detail. The authors should provide a detailed method (as a preprint on a public database or described in the method section).

      We thank the reviewers for their interest in the details of the specific methods for single cell B cell receptor sequencing. We regret that the manuscript is still in preparation. In fact, our current methods section provides much more detail about sequencing methods than is customarily supplied by authors mAb development papers. However, we understand the frustration and will remove our citation of our manuscript in preparation in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their paper, Kang et al. investigate rigidity sensing in amoeboid cells, showing that, despite their lack of proper focal adhesions, amoeboid migration of single cells is impacted by substrate rigidity. In fact, many different amoeboid cell types can durotax, meaning that they preferentially move towards the stiffer side of a rigidity gradient. 

      The authors observed that NMIIA is required for durotaxis and, buiding on this observation, they generated a model to explain how durotaxis could be achieved in the absence of strong adhesions. According to the model, substrate stiffness alters the diffusion rate of NMAII, with softer substrates allowing for faster diffusion. This allows for NMAII accumulation at the back, which, in turn, results in durotaxis. 

      The authors responded to all my comments and I have nothing to add. The evidence provided for durotaxis of non adherent (or low-adhering) cells is strong. I am particularly impressed by the fact that amoeboid cells can durotax even when not confined. I wish to congratulate the authors for the excellent work, which will fuel discussion in the field of cell adhesion and migration.

      We thank the reviewer for critically evaluating our work and giving kind suggestions. We are glad that the reviewer found our work to be of potential interest to the broad scientific community.

      Reviewer #2 (Public Review):

      Summary:

      The authors developed an imaging-based device that provides both spatialconfinement and stiffness gradient to investigate if and how amoeboid cells, including T cells, neutrophils, and Dictyostelium, can durotax. Furthermore, the authors showed that the mechanism for the directional migration of T cells and neutrophils depends on non-muscle myosin IIA (NMIIA) polarized towards the soft-matrix-side. Finally, they developed a mathematical model of an active gel that captures the behavior of the cells described in vitro.

      Strengths:

      The topic is intriguing as durotaxis is essentially thought to be a direct consequence of mechanosensing at focal adhesions. To the best of my knowledge, this is the first report on amoeboid cells that do not depend on FAs to exert durotaxis. The authors developed an imaging-based durotaxis device that provides both spatial confinement and stiffness gradient and they also utilized several techniques such as quantitative fluorescent speckle microscopy and expansion microscopy. The results of this study have well-designed control experiments and are therefore convincing.

      Weaknesses:

      Overall this study is well performed but there are still some minor issues I recommend the authors address:

      (1) When using NMIIA/NMIIB knockdown cell lines to distinguish the role of NMIIA and NMIIB in amoeboid durotaxis, it would be better if the authors took compensatory effects into account.

      We thank the reviewer for this suggestion. We have investigated the compensation of myosin in NMIIA and NMIIB KD HL-60 cells using Western blot and added this result in our updated manuscript (Fig. S4B, C). The results showed that the level of NMIIB protein in NMIIA KD cells doubled while there was no compensatory upregulation of NMIIA in NMIIB KD cells. This is consistent with our conclusion that NMIIA rather than NMIIB is responsible for amoeboid durotaxis since in NMIIA KD cells, compensatory upregulation of NMIIB did not rescue the durotaxis-deficient phenotype. 

      (2) The expansion microscopy assay is not clearly described and some details are missed such as how the assay is performed on cells under confinement.

      We thank the reviewer for this comment. We have updated details of the expansion microscopy assay in our revised manuscript in line 481-485 including how the assay is performed on cells under confinement:

      Briefly, CD4+ Naïve T cells were seeded on a gradient PA gel with another upper gel providing confinement. 4% PFA was used to fix cells for 15 min at room temperature. After fixation, the upper gradient PA gel is carefully removed and the bottom gradient PA gel with seeded cells were immersed in an anchoring solution containing 1% acrylamide and 0.7% formaldehyde (Sigma, F8775) for 5 h at 37 °C.

      (3) In this study, an active gel model was employed to capture experimental observations. Previously, some active nematic models were also considered to describe cell migration, which is controlled by filament contraction. I suggest the authors provide a short discussion on the comparison between the present theory and those prior models.

      We thank the reviewer for this suggestion. Active nematic models have been employed to recapitulate many phenomena during cell migration (Nat Commun., 2018, doi: 10.1038/s41467-018-05666-8.). The active nematic model describes the motion of cells using the orientation field, Q, and the velocity field, u. The director field n with (n = −n) is employed to represent the nematic state, which has head-tail symmetry. However, in our experiments, actin filaments are obviously polarized, which polymerize and flow towards the direction of cell migration. Therefore, we choose active gel model which describes polarized actin field during cell migration. In the discussion part, we have provided the comparison between active gel model and motor-clutch model. We have also supplemented a short discussion between the present model and active nematic model in the main text of line 345-347:

      The active nematic model employs active extensile or contractile agents to push or pull the fluid along their elongation axis to simulate cells flowing (61). 

      (4) In the present model, actin flow contributes to cell migration while myosin distribution determines cell polarity. How does this model couple actin and myosin together?

      We thank the reviewer for this question. In our model, the polarization field is employed to couple actin and myosin together. It is obvious that actin accumulate at the front while myosin diffuses in the opposite direction. Therefore, we propose that actin and myosin flow towards the opposite direction, which is captured in the convection term of actin ) and myosin () density field.

    2. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their paper, Kang et al. investigate rigidity sensing in amoeboid cells, showing that, despite their lack of proper focal adhesions, amoeboid migration of single cells is impacted by substrate rigidity. In fact, many different amoeboid cell types can durotax, meaning that they preferentially move towards the stiffer side of a rigidity gradient. 

      The authors observed that NMIIA is required for durotaxis and, buiding on this observation, they generated a model to explain how durotaxis could be achieved in the absence of strong adhesions. According to the model, substrate stiffness alters the diffusion rate of NMAII, with softer substrates allowing for faster diffusion. This allows for NMAII accumulation at the back, which, in turn, results in durotaxis. 

      The authors responded to all my comments and I have nothing to add. The evidence provided for durotaxis of non adherent (or low-adhering) cells is strong. I am particularly impressed by the fact that amoeboid cells can durotax even when not confined. I wish to congratulate the authors for the excellent work, which will fuel discussion in the field of cell adhesion and migration.

      We thank the reviewer for critically evaluating our work and giving kind suggestions. We are glad that the reviewer found our work to be of potential interest to the broad scientific community.

      Reviewer #2 (Public Review):

      Summary:

      The authors developed an imaging-based device that provides both spatialconfinement and stiffness gradient to investigate if and how amoeboid cells, including T cells, neutrophils, and Dictyostelium, can durotax. Furthermore, the authors showed that the mechanism for the directional migration of T cells and neutrophils depends on non-muscle myosin IIA (NMIIA) polarized towards the soft-matrix-side. Finally, they developed a mathematical model of an active gel that captures the behavior of the cells described in vitro.

      Strengths:

      The topic is intriguing as durotaxis is essentially thought to be a direct consequence of mechanosensing at focal adhesions. To the best of my knowledge, this is the first report on amoeboid cells that do not depend on FAs to exert durotaxis. The authors developed an imaging-based durotaxis device that provides both spatial confinement and stiffness gradient and they also utilized several techniques such as quantitative fluorescent speckle microscopy and expansion microscopy. The results of this study have well-designed control experiments and are therefore convincing.

      Weaknesses:

      Overall this study is well performed but there are still some minor issues I recommend the authors address:

      (1) When using NMIIA/NMIIB knockdown cell lines to distinguish the role of NMIIA and NMIIB in amoeboid durotaxis, it would be better if the authors took compensatory effects into account.

      We thank the reviewer for this suggestion. We have investigated the compensation of myosin in NMIIA and NMIIB KD HL-60 cells using Western blot and added this result in our updated manuscript (Fig. S4B, C). The results showed that the level of NMIIB protein in NMIIA KD cells doubled while there was no compensatory upregulation of NMIIA in NMIIB KD cells. This is consistent with our conclusion that NMIIA rather than NMIIB is responsible for amoeboid durotaxis since in NMIIA KD cells, compensatory upregulation of NMIIB did not rescue the durotaxis-deficient phenotype. 

      (2) The expansion microscopy assay is not clearly described and some details are missed such as how the assay is performed on cells under confinement.

      We thank the reviewer for this comment. We have updated details of the expansion microscopy assay in our revised manuscript in line 481-485 including how the assay is performed on cells under confinement:

      Briefly, CD4+ Naïve T cells were seeded on a gradient PA gel with another upper gel providing confinement. 4% PFA was used to fix cells for 15 min at room temperature. After fixation, the upper gradient PA gel is carefully removed and the bottom gradient PA gel with seeded cells were immersed in an anchoring solution containing 1% acrylamide and 0.7% formaldehyde (Sigma, F8775) for 5 h at 37 °C.

      (3) In this study, an active gel model was employed to capture experimental observations. Previously, some active nematic models were also considered to describe cell migration, which is controlled by filament contraction. I suggest the authors provide a short discussion on the comparison between the present theory and those prior models.

      We thank the reviewer for this suggestion. Active nematic models have been employed to recapitulate many phenomena during cell migration (Nat Commun., 2018, doi: 10.1038/s41467-018-05666-8.). The active nematic model describes the motion of cells using the orientation field, Q, and the velocity field, u. The director field n with (n = −n) is employed to represent the nematic state, which has head-tail symmetry. However, in our experiments, actin filaments are obviously polarized, which polymerize and flow towards the direction of cell migration. Therefore, we choose active gel model which describes polarized actin field during cell migration. In the discussion part, we have provided the comparison between active gel model and motor-clutch model. We have also supplemented a short discussion between the present model and active nematic model in the main text of line 345-347:

      The active nematic model employs active extensile or contractile agents to push or pull the fluid along their elongation axis to simulate cells flowing (61). 

      (4) In the present model, actin flow contributes to cell migration while myosin distribution determines cell polarity. How does this model couple actin and myosin together?

      We thank the reviewer for this question. In our model, the polarization field is employed to couple actin and myosin together. It is obvious that actin accumulate at the front while myosin diffuses in the opposite direction. Therefore, we propose that actin and myosin flow towards the opposite direction, which is captured in the convection term of actin ) and myosin () density field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      With socioeconomic development, more and more people are obese which is an important reason for sub-fertility and infertility. Maternal obesity reduces oocyte quality which may be a reason for the high risk of metabolic diseases for offspring in adulthood. Yet the underlying mechanisms are not well elucidated. Here the authors examined the effects of maternal obesity on oocyte methylation. Hyper-methylation in oocytes was reported by the authors, and the altered methylation in oocytes may be partially transmitted to F2. The authors further explored the association between the metabolome of serum and the altered methylation in oocytes. The authors identified decreased melatonin. Melatonin is involved in regulating the hyper-methylation of high-fat diet (HFD) oocytes, via increasing the expression of DNMTs which is mediated by the cAMP/PKA/CREB pathway.

      Strengths:

      This study is interesting and should have significant implications for the understanding of the transgenerational inheritance of GDM in humans.

      Thank you for your positive comments to our manuscript.

      Weaknesses:

      The link between altered DNA methylation and offspring metabolic disorders is not well elucidated; how the altered DNA methylation in oocytes escapes reprogramming in transgenerational inheritance is also unclear.

      Thanks. These are very good questions. There is a long way to completely elucidate the relationship between methylation and offspring metabolic disorders, and the underlying mechanisms of obtained methylation escaping the reprogramming during development. We would like to explore these in the future.

      Reviewer #2 (Public Review):

      This manuscript offers significant insights into the impact of maternal obesity on oocyte methylation and its transgenerational effects. The study employs comprehensive methodologies, including transgenerational breeding experiments, whole genome bisulfite sequencing, and metabolomics analysis, to explore how high-fat diet (HFD)-induced obesity alters genomic methylation in oocytes and how these changes are inherited by subsequent generations. The findings suggest that maternal obesity induces hyper-methylation in oocytes, which is partly transmitted to F1 and F2 oocytes and livers, potentially contributing to metabolic disorders in offspring. Notably, the study identifies melatonin as a key regulator of this hyper-methylation process, mediated through the cAMP/PKA/CREB pathway.

      Strengths:

      The study employs comprehensive methodologies, including transgenerational breeding experiments, whole genome bisulfite sequencing, and metabolomics analysis, and provides convincing data.

      Thank you for your positive comments to our manuscript.

      Weaknesses:

      The description in the results section is somewhat verbose. This section (lines 126~227) utilized transgenerational breeding experiments and methylation analysis to demonstrate that maternal obesity-induced alterations in oocyte methylation (including hyper-DMRs and hypo-DMRs) can be partially transmitted to F1 and F2 oocytes and livers. The authors should consider condensing and revising this section for clarity and brevity.

      Thanks for your suggestions. We have re-written this parts in the revised manuscript.

      There is a contradiction with Reference 3, but the discrepancy is not discussed. In this study, the authors observed an increase in global methylation in oocytes from HFD mice, whereas Reference 3 indicates Stella insufficiency in oocytes from HFD mice. This Stella insufficiency should lead to decreased methylation (Reference 33). There should be a discussion of how this discrepancy can be reconciled with the authors' findings.

      Thanks for your suggestions. As reported by Reference 33, STELLA prevents hypermethylation in oocytes by sequestering UHRF1 from the nuclei which recruits DNMT1 into nuclei. Han et al. reported that obesity induced by high-fat diet reduces STELLA level in oocytes. These indicate that STELLA insufficiency might induce hypermethylation in oocytes, although significant hypermethylation in obese oocytes is not reported by Han et al. using immunofluorescence. This contradiction may be caused by the limited sample sizes (n=14) used by Han et al. We have added a brief discussion in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Maternal obesity is a health problem for both pregnant women and their offspring. Previous works including work from this group have shown significant DNA methylation changes for offspring of obese pregnancies in mice. In this manuscript, Chao et al digested the potential mechanisms behind the DNA methylation changes. The major observations of the work include transgenerational DNA methylation changes in offspring of maternal obesity, and metabolites such as methionine and melatonin correlated with the above epigenetic changes. Exogenous melatonin treatment could reverse the effects of obesity. The authors further hypothesized that the linkage may be mediated by the cAMP/PKA/CREB pathway to regulate the expression of DNMTs.

      Strengths:

      The transgenerational change of DNA methylation following HFD is of great interest for future research to follow. The metabolic treatment that could change the DNA methylation in oocytes is also interesting and has potential relevance to future clinical practice.

      Thank you for your positive comments to our manuscript.

      Weaknesses:

      The HFD oocytes have more 5mC signal based on staining and sequencing (Fig 1A-1F). However, the authors also identified almost equal numbers of hyper- and hypo-DMRs, which raises questions regarding where these hypo-DMRs were located and how to interpret their behaviors and functions. These questions are also critical to address in the following mechanistic dissections as the metabolic treatments may also induce bi-directional changes of DNA methylation. The authors should carefully assess these conflicts to make the conclusions solid.

      Thanks for the helpful comments and suggestions. As presented in Fig. 1F, there is an increase of methylation level in promoter and exon regions and there is a decrease in intron, utr3 and repeat regions. According to the suggestions, we further analyzed the distribution of DMRs, and found that hypo-DMRs were mainly distributed at utr3, intron, repeat, and tes regions compared with hyper-DMRs (Fig. S3). These suggest that the distribution of DMRs in genome is not random.

      The transgenerational epigenetic modifications are controversial. Even for F0 offspring under maternal obesity, there were different observations compared to this work (Hou, YJ., et al. Sci Rep, 2016). The authors should discuss the inconsistencies with previous works.

      Thanks for the suggestions. There are contradictions on the whole genome DNA methylation of oocytes in obese mice. Hou YJ et al. in 2016 reported that obesity reduces the whole genome DNA methylation of NSN GV oocytes using immunofluorescence. In 2018, Han LS et al. reported that the whole genome 5mC of oocytes is not significantly influenced by obesity using immunofluorescence, but they find the Stella level is reduced in oocytes by obesity. Stella locates in the cytoplasm and nuclei of oocytes and sequesters Uhrf1 from the nuclei. Stella knockout in oocytes results in about twofold increase of global methylation in MII oocytes via recruiting more DNMT1 into nuclei. These suggest that the global methylation of oocytes in obese mice should be increased, but the similar methylation in oocytes between obese and non-obese mice is reported by Han LS et al. Thus, the contradiction may be induced by the different sample size in our manuscript and previous studies, and Hou YJ and colleagues just examined the methylation of NSN GV oocytes. As present in Stella+/- oocytes, the global methylation of oocytes is normal, which suggest that the insufficiency of Stella may be not the main reason for the increased methylation of oocytes in obese mice. We have added a brief discussion in the revised manuscript.

      In addition to the above inconsistencies, the DNA methylation analysis in this work was not carefully evaluated. Several previous works were evaluating the DNA methylation in mice oocytes, which showed global methylation levels of around 50% (Shirane K, et al. PLoS Genet, 2013; Wang L., et al, Cell, 2014). In Figure 1E, the overall methylation level is about 23% in control, which is significantly different from previous works. The authors should provide more details regarding the WGBS procedure, including but not limited to sequencing coverage, bisulfite conversion rate, etc.

      Thanks for the good questions. Smallwood et al. reported the the CG methylation of MII oocyte is about 33.1% (Smallwood et al. Nature Methods, 2014) using single-cell genome-wide bisulfite sequencing. Shirane K et al. reported that the average methylation level of GV oocytes is 37.9%. Kobayashi H et al. Reported that the CG methylation in GV oocytes is about 40% (Kobayashi H et al. Plos Genet. 2012). CG methylation in fully grown oocytes is about 38.7% (Maenohara S et al. Plos Genet. 2017). The variation of methylation in oocytes is associated with sequencing methods, sequencing depth, and mapping rates. In the present study, whole genome bisulfite sequencing (WGBS) for small sample and methylation analysis were performed by NovoGene. The reads are 31613641 to 37359643, unique mapping rate is ≥32.88%,  conversation rate is > 99.44%, and sequencing depth is 2.45 to 2.75. Relative information is presented in Table S1. The sequencing depth might be a reason for the inconsistence. But we further confirmed our sequencing results using bisulfite sequencing (BS), and the result is similar between BS and WGBS results. These findings suggest that our results are reliable.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Since the results show that melatonin may play a role in hyper-methylation, the authors need to give some basic information in the Introduction section.

      Thanks. We added more information in the section of Introduction.

      (2) There are many differential metabolites identified. Besides melatonin, other differential metabolites are involved in the altered methylation in oocytes

      These is a good question. We firstly filtered the differential metabolites which may be involved in methylation, and then further filtered these metabolites according to the relative DNA methylation pathways and published papers. After that, we confirmed the concentrations of relative metabolites in the serum using ELISA. Certainly, we can not completely exclude all the metabolites which might involved in regulating DNA methylation.

      (3) The altered methylation would be found in the F1 tissues. Did the authors examine the other parts besides the liver?

      Thank you. In the present study, we didn’t examined the DNA methylation in the other tissues besides the liver. We agree that the altered methylation should be observed in the other tissues.

      (4) Did the authors try or guess how many generations the maternal obesity-induced genomic methylation alterations can be transmitted?

      Thanks. This is a good question. Takahashi Y and colleagues reported that obtained DNA methylation at CpG island can be transmitted across multiple generations using DNA methylation-edited mouse (Takahashi Y et al. 2023, cell). Similar inheritance is also reported by other studies using different models.

      (5) The F2 is indirectly affected by maternal obesity, so the evidence is not enough to prove the transgenerational inheritance of the altered methylation.

      Thanks. We find the altered DNA methylation in F2 tissue and oocytes is similar to that in F1 oocytes. These suggest the altered DNA methylation in F2 oocytes should be at least partly transmitted to F3. Previous paper (Takahashi Y et al. 2023, cell) confirms that obtain DNA methylation in CpG island can be transmitted across several generations through paternal and maternal germ lines. Certainly, it’s better if it is examined in F3 tissues.

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure Font Size: The font sizes in the figures are quite inconsistent. Please try to uniform the font size of similar types of text.

      Thanks for your suggestions. We re-edited the relative figures in the revised manuscript.

      (2) Figure Clarity: Ensure that all critical information in the figures is clearly visible, such as in Figure 3C.

      Thank you. We revised this figure.

      (3) Figure 1B, C: The position of the asterisks ("**") is not centered in the corresponding columns, and the font size is too small. Please correct this and address similar issues in other figures.

      Thank you for your suggestions. We re-edited these in the revised figures.

      (4) Line 126: The current expression is confusing. It may be revised to: "Both the oocyte quality and the uterine environment can contribute to adult diseases, which may be mediated by epigenetic modifications."

      Thanks. We revised this sentence in the revised manuscript.

      (5) Missing Panel in Figure 3: Figure 3 is missing panel 3N.

      Thank you so much. We corrected it in the revised manuscript.

      (6) Figure Panel Order: Please adjust the order of the panels in the figures to follow a logical reading sequence.

      Thank you. We changed the orders in the revised manuscript.

      (7) Line 493: Correct "inthe" to "in the".

      Thank you. We revised it.

      (8) Lines 102-106: Polish the wording and expression, an example as follows: "We analyzed the differentially methylated regions (DMRs) in oocytes from both HFD and CD groups and identified 4,340 DMRs. These DMRs were defined by the criteria: number of CG sites {greater than or equal to} 4 and absolute methylation difference {greater than or equal to} 0.2. Among these, 2,013 were hyper-DMRs (46.38%) and 2,327 were hypo-DMRs (53.62%) (Fig. 1G). These DMRs were distributed across all chromosomes (Fig. 1H). "

      Thank you! We re-wrote these parts in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The sample numbers should be annotated in the figure legend for all the bar plots using Image J. The lines in Figures 2B and 2C were without error bars. How many mice were used for these plots?

      Thanks for your suggestions. We added the sample size in the revised manuscript. We made a mistake when we prepared the pictures for figure 2B and figure 2C, which resulted in missing the error bars. We have corrected these pictures. Thanks again!

      The authors should revise the panel arrangement of the figures (Figure 2, Figure 5, etc) to make them more clear and readable.

      Thank you! We have revised these in the revised manuscript.

      The writing should be improved since there were multiple typos and unclear expressions. AI tools like Grammarly or ChatGPT may help.

      Thank you! We have re-edited the language in the revised manuscript using AI tools.

      Please recheck the immunofluorescence images for clear interpretability. For example, in Figure 5F (H89 treated), the GV is all the way at the edge of the oocyte, and the oocyte in the DIC image appears like it is partially lysed. The DIC images and the DAPI images are not clear enough.

      Thanks for your suggestions. We have re-edited these pictures in the revised manuscript.

      Another concern is that the Methods describes the immunofluorescence preparation for 5mC and 5hmC staining as a simple fixation in 4% paraformaldehyde followed by permeabilization with .5% TritonX-100, but there is no antigen exposure step described, a step that is normally required for visualizing these DNA modifications (e.g., 4N HCl).

      Thanks. Sorry for that we didn’t describe the methods clearly. We have added more information about the methods in the revised manuscript.

      The metabolomic analysis revealed a highly significant increase in dibutylphthalate, genistein, and daidzein in the control mice. The presence of these exogenous metabolites suggests that the diets differed in many aspects, not just fat content, so it would be very difficult to interpret the results as related to a high-fat diet alone. Both daidzein and genistein are phytoestrogens and dibutylphthalate is a plasticizer, suggesting differences in the diet and/or in the materials used to collect the samples for analysis from the mice. The Methods define the high-fat diet adequately, as the formulation can be found online using the catalog number. However, the control diet is just listed as "normal diet", so one has no idea what is in it

      Thank you for your good questions. The daidzein and genistein may be from the diets and the dibutylthalate may be from the materials used to collect samples. If so, these should be similar between groups. Thus, we added the formulation of normal diet in the revised manuscript. The raw materials of normal diet include corn, bean pulp, fish meal, flour, yeast powder, plant oil, salt, vitamins, and mineral elements. According to the suggestions, we re-checked the data about these metabolites, and found that the abundance of these metabolites was low. And the result of these metabolites was at a low confidence level because the iron of these metabolites was only mapped to ChemSpider(HMDB,KEGG,LIPID MAPS). To further confirm these results, we examined these metabolites in serum using ELISA, and results revealed that the concentrations of genistein and dibutylthalate were similar between groups. These results suggest that these metabolites may be not involved in the altered methylation of oocytes induced by obesity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work by Wang et al., the authors use single-molecule super-resolution microscopy together with biochemical assays to quantify the organization of Nipah virus fusion protein F (NiV-F) on cell and viral membranes. They find that these proteins form nanoscale clusters which favors membrane fusion activation, and that the physical parameters of these clusters are unaffected by protein expression level and endosomal cleavage. Furthermore, they find that the cluster organization is affected by mutations in the trimer interface on the NiV-F ectodomain and the putative oligomerization motif on the transmembrane domain, and that the clusters are stabilized by interactions among NiV-F, the AP2-complex, and the clathrin coat assembly. This work improves our understanding of the NiV fusion machinery, which may have implications also for our understanding of the function of other viruses.

      Strengths:

      The conclusions of this paper are well-supported by the presented data. This study sheds light on the activation mechanisms underlying the NiV fusion machinery.

      Weaknesses:

      The authors provide limited details of the convolutional neural network they developed in this work. Even though custom-codes are made available, a description of the network and specifications of how it was used in this work would aid the readers in assessing its performance and applicability. The same holds for the custom-written OPTICS algorithm. Furthermore, limited details are provided for the imaging setup, oxygen scavenging buffer, and analysis for the single-molecule data, which limits reproducibility in other laboratories. The claim of 10 nm resolution is not backed up by data and seems low given the imaging conditions and fluorophores used. Fourier Ring Correlation analysis would have validated this claim. If the authors refer to localization precision rather than resolution, then this should be specified and appropriate data provided to support this claim.

      We thank reviewer 1 for these suggestions. We described key steps in imaging setup, singlemolecule data reconstruction, the OPTICS algorithm in cluster identification, and 1D CNN in

      classification of the OPTICS data in the Materials and Methods section. We also provided a recipe for the imaging buffer. We refer to 10 nm localization precision rather than resolution. The localization precision achieved by our SMLM system is shown in the Author response image 1.

      Author response image 1.

      The localization precision of the custom-built SMLM. Shows the distribution of localization error at the x (dX), y (dY), and z (dZ) direction in nanometer of blinks generated from Alexa Flour 647 labeled to NiV-F expressed on the plasma membrane of PK13 cells. The lateral precision is <10 nm and the axial precision is < 20 nm. 

      Reviewer #2 (Public Review): 

      Summary:

      In this manuscript, Wang and co-workers employ single molecule light microscopy (SMLM) to detect NiV fusion protein (NiV-F) in the surface of cells. They corroborate that these glycoproteins form microclusters (previously seen and characterized together with the NiVG and Nipah Matrix protein by Liu and co-workers (2018) also with super-resolution light microscopy). Also seen by Liu and coworkers the authors show that the level of expression of NiV-F does not alter the identity of these microclusters nor endosomal cleavage. Moreover, mutations and the transmembrane domain or the hexamer-of-trimer interface seem to have a mild effect on the size of the clusters that the authors quantified.

      Importantly, it has also been shown that these particles tend to cluster in Nipah VLPs.

      We thank reviewer #2 for the comments and suggestions. This paper is built on Liu et al 1 to further characterize the nanoclusters formed by NiV-F and their role in membrane fusion activation. While Liu et al. studied the NiV glycoprotein distribution at the NiV assembly sites to inform mechanisms in NiV assembly and release, Wang et al. analyzed the nanoorganization and distribution of NiV-F at the prefusion conformation, providing insights into the membrane fusion activation mechanisms.  

      Strengths:

      The authors have tried to perform SMLM in single VLPs and have shown partially the importance of NiV-F clustering.

      Weaknesses:

      The labelling strategy for the NiV-F is not sufficiently explained. The use of a FLAG tag in the extracellular domain should be validated and compared with the unlabelled WT NiV-F when expressed in functional pseudoviruses (for example HIV-1 based particles decorated with NiV-F). This experiment should also be carried out for both infection and fusion (including BlaM-Vpr as a readout for fusion). I would also suggest to run a time-of-addition BlaM experiment to understand how this particular labelling strategy affects single virion fusion as compared to the the WT.  

      We thank reviewer #2 for this suggestion. We have made various efforts to validate the expression and function of FLAG-tagged NiV-F. The NiV-F-FLAG shows comparable cell surface expression levels and induces similar cell-cell fusion levels in 293T cells as that of untagged NiV-F 1. The NiV-F-FLAG also showed similar levels of virus entry as untagged NiV-F when both were pseudotyped on a recombinant Vesicular Stomatitis Virus (VSV) with the VSV glycoprotein replaced by a Renilla luciferase reporter gene (VSV-ΔG-rLuc; Fig. S1D). We also performed a virus entry kinetics assay using NiV VLPs expressing NiV-M-βlactamase (NiV-M-Bla), NiV-G-HA, and NiV-F-FLAG, NiV-F-AU1 or untagged NiV-F. The intracellular AU1 tag is located at the C-terminus of NiV-F (Genbank accession no. AY816748.1). However, we detected different levels of NiV-M-Bla in equal volume of VLPs, suggesting that the tags in NiV-F affect the budding of the VLPs (Author response image 2A). Therefore, we performed fusion kinetics assay by using VLPs expressing the same levels of NiV-M-Bla. Among them, the NiV-F-FLAG on VLPs shows the most efficient fusion between VLP and HEK293T cell membranes (Author response image 2B), significantly more efficient than that of untagged NiV-F and NiV-FAU1. However, we cannot attribute the enhanced fusion activity to the FLAG tag, because the readout of this assay relies on both the levels of β-lactamase (introduced by NiV-M-Bla in VLPs) and the NiV-F constructs. The tags in NiV-F could affect both the budding of VLPs and the stoichiometry of F and M in individual VLPs. We did not use the HIV-based pseudovirus system because the incorporation of NiV-F into HIV pseudoviruses requires a C-terminal deletion 2,3.

      In summary, the FLAG tag does not affect cell-cell fusion 1 and virus entry when pseudotyped to the recombinant VSV-ΔG-rLuc viruses (Fig. S1D). Given that we do not observe any difference in clustering between an HA- and FLAG-tagged NiV-F constructs on PK13 cell surface (Fig. S1A-C), we conclude that the FLAG tag has minimal effect on both the fusion activity and the nanoscale distribution of NiV-F. 

      Author response image 2.

      Viral entry is not affected by labeling of NiV-F. A) Western blot analysis of NiV-M-Bla in NiV-VLPs generated by HEK293T cells expressing NiV-M-Bla, NiV-G-HA and NiV-F-FLAG, untagged NiV-F, or NiV-F-AU1. Equal volume of VLPs were separated by a denaturing 10% SDS–PAGE and probed against β-lactamase (SANTA CRUZ, sc-66062). B) NiV-VLPs expressing NiV-M-BLa, NiV-G-HA, and NiV-F-FLAG, untagged NiV-F or NiV-F-AU1 expression plasmids were bond to the target HEK293T cells loaded with CCF2-AM dye at 4°C. The Blue/Green (B/G) ratio was measured at 37°C for 4 hrs at a 3-min interval. Results were normalized to the maximal B/G ratio of NiV-F-FLAG-NiV VLPs. Results from one representative experiment out of three independent experiments are shown. 

      It would also be very important to compare the FLAG labelling approach with recent advances in the field (for instance incorporating noncanonical amino acids (ncAAs) into NiVF by amber stop-codon suppression, followed by click chemistry). 

      We are greatly thankful for this comment from reviewer #2. Labeling noncanonical amino acids (ncAAs) with biorthogonal click chemistry is indeed a more precise labeling strategy compared to the traditional epitope labeling approach used in this paper. We will explore the applications of ncAAs labeling in single-molecule localization imaging and virus-host interactions in future projects. 

      In this paper, the FLAG tag inserted in NiV-F protein seems to have minimal effect on the NiV-F-induced virus entry and cell-cell fusion 1 (Fig. S1). Although the FLAG tag labeling approach may increase the detectable size of NiV-F nanoclusters due to the use of the antibody complex, it should not affect our conclusions drawn from the relative comparisons between wt and mutant NiV-F or control and drug-treated cells. 

      The correlation between the existence of microclusters of a particular size and their functionality is missing. Only cell-cell fusion assays are shown in supplementary figures and clearly, single virus entry and fusion cannot be compared with the biophysics of cell-cell fusion. Not only the environment is completely different, membrane curvature and the number of NiV-F drastically varies also. Therefore, specific fusion assays (either single virus tracking and/or time-of-addition BlaM kinetics with functional pseudoviruses) are needed to substantiate this claim.  

      We thank Reviewer 2 for the suggestion. To support the link between F clustering and viruscell membrane fusion, we conducted pseudotyped virus entry and VLP fusion kinetics assays, as shown in revised Figure S4. The viral entry results (Fig. S4 E and F) corroborate that of the cell-cell fusion assay (Fig. S4A and B) and previously published data 4. The fusion kinetics confirmed that the real-time fusion kinetics was affected by mutations at the hexameric interface, with the hypo-fusogenic mutants L53D and V108D exhibited reduced entry efficiency while the hyper-fusogenic mutant Q393L showed increased efficiency (Fig. S4G and H). The results were described in detail in the revised manuscript. 

      Additionally, we performed a pseudotyped virus entry assay on the LI4A (Fig. S6F and G) and YA (Fig. S7F and G) mutants to verify the function of these mutants on viruses in revised Supplemental Figures. Neither LI4A nor YA incorporated into the VSV/NiV pseudotyped viruses as shown by the Western blot analyses of the pseudovirions (Fig. S6F and S7F), and thus did not induce virus entry, consisting with the cell-cell fusion results (Fig. S6C, D and Fig. S7C, D). We did not perform the entry kinetic assay of these two mutants as they do not incorporate into VLPs or pseudovirions. 

      The authors also claim they could not characterize the number of NiV-F particles per cluster. Another technique such as number and brightness (Digman et al., 2008) could support current SMLM data and identify the number of single molecules per cluster. Also, this technology does not require complex microscopy apparatus. I suggest they perform either confocal fluorescence fluctuation spectroscopy or TIRF-based nandb to validate the clusters and identify how many molecule are present in these clusters.  

      We thank reviewer 2 for this suggestion. Determining the true copy number of NiV-F in individual clusters could verify whether the F clusters on the plasma membrane are hexamer-of-trimer assemblies. Regardless, it does not affect our conclusion that the organization of NiV-F into nanoclusters affects the membrane fusion triggering ability. The confocal fluorescence fluctuation spectroscopy (FFS) and TIRF-based analyses are accessible tools for quantifying fluorophore copy numbers and/or stoichiometry based on fluorescence fluctuation or photobleaching. However, these methods are unable to quantify the number of proteins in individual clusters because they analyze fluorophores either in the entire cell (as in wide-field epifluorescence microscopy coupled with FFS and TIRF-coupled photobleaching) 5–7 or within a large excitation volume (confocal laser scanning microscopycoupled FFS) 8. Both of these volumes are significantly larger than a single NiV-F cluster, which has an average diameter of 24-26 nm (Fig. 1F). 

      The current SMLM setup is useful for characterizing the protein distribution and organization. However, quantifying the true protein copy number within a nanocluster is challenging because of the stochasticity of fluorophore blinking and the unknown labeling stoichiometry 9–11. To address the challenge in fluorophore blinking, quantitative DNA-PAINT (qDNA-PAINT) may be used because the on-off frequency of the fluorophores is tied to the well-defined kinetic constants of DNA binding and the influx rate of the imager strands, rather than the stochasticity of fluorophore blinking. Thus, the frequency of blinks can be translated to protein counting 12. To address the challenge in unknown labeling stoichiometry, DNA origami can be used as a calibration standard 11. DNA origami supports handles at a regular space with several to tens of nanometers apart, and the handles can be conjugated with a certain number of proteins of interest. The copy number of protein interest in the experimental group can be determined by comparing the SMLM localization distribution of the sample to that of the DNA origami calibration standard. Given the requirement of a more sophisticated SMLM setup and a high-precision calibration tool, we will explore the quantification of NiV-F copy numbers in nanoclusters in a future project. 

      Also, it is not clear how many cells the authors employ for their statistics (at least 30-50 cells should be employed and not consider the number of events blinking events. I hope the authors are not considering only a single cell to run their stats... The differences between the mutants and the NiV-F is minor even if their statistical analyses give a difference (they should average the number and size of the clusters per cell for a total of 30-50 cells with experiments performed at least in three different cells following the same protocol). Overall, it seems that the authors have only evaluated a very low number of cells.

      We disagree with this comment from Reviewer #2. The sample size for cluster analysis in SMLM images was chosen by considering the target of the study (cells and VLPs) and the data acquisition and analysis standards in the SMLM imaging field. We also noted the sample size (# of ROI and cells) in the figure legend. 

      Below, we compared the sample sizes in our study to those in similar studies that used comparable imaging and cluster analysis methods from 2015 to 2024. The classical clustering analysis methods are categorized into global clustering (e.g. nearest neighbor analysis, Ripley’s K function, and pair correlation function) and complete clustering, such as density-based analysis (e.g. DBSCAN, Superstructure, FOCAL, ToMATo) and Tessellationbased analysis (e.g. Delaunay triangulation, Voronoii Tessellation). The global clustering analysis method provides spatial statistics for global protein clustering or organization (e.g. clustering extent), while the complete clustering approach extracts information from a single-cluster level, such as the morphology and localization density of individual clusters. We used the density-based analyses, DBSCAN and OPTICS, for cluster analysis on cell plasma membranes and VLP membranes. 

      Author response table 1.

      The comparison of imaging methods, analysis methods, and sample size in the current study to other studies conducted from 2015 to 2024.

      They should also compare the level of expression (with the number of molecules per cell provided by number and brightness) with the total number of clusters. 

      We thank reviewer 2 for this suggestion. We compared the level of expression with the total number of clusters for F-WT in Figure 1I in the main text.  

      The same applies to the VLP assay. I assume the authors have only taken VLPs expressing both NiV-M and NiV-F (and NiV-G). But even if this is not clearly stated I would urge the authors to show how many viruses were compared per condition (normally I would expect 300 particles per condition coming from three independent experiments. As a negative control to evaluate the cluster effect I would mix the different conditions. Clearly you have clusters with all conditions and the differences in clustering depending on each condition are minimal. Therefore you need to increase the n for all experiments.

      We thank reviewer 2 for this comment. We acquired and analyzed more images of NiV VLPs bearing F-WT, Q393L, L53D, and V108D. Results are shown in the revised Figure 4 and the number of VLPs (>300) used for analysis is specified in the figure legend. An increased number of VLP images does not affect the classification result in Figure 4C. 

      As for the suggestion on “evaluating the cluster effect at different mixed conditions”, I assume that reviewer 2 would like to see how the presence of different viral structural proteins (F, M, and G) on VLPs could affect F clustering.  We showed that the organization of NiV envelope proteins on the VLP membrane is similar in the presence or absence of NiV-M by direct visualization 27, suggesting that the effect of NiV-M on F-WT clustering on VLPs is minimal. We also show comparable incorporation of NiV-F among the NiV-F hexamer-oftrimer mutants (Fig. 4A). Therefore, we did not test the F clustering at different F, M, and G combinations in this paper. However, this could be an interesting question to pursue in a paper focusing on NiV VLP production. 

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang and colleagues describes single molecule localization microscopy to quantify the distribution and organization of Nipah virus F expressed on cells and on virus-like particles. Notably the crystal structure of F indicated hexameric assemblies of F trimers. The authors propose that F clustering favors membrane fusion.

      Strengths:

      The manuscript provides solid data on imaging of F clustering with the main findings of:

      -  F clusters are independent of expression levels

      -  Proteolytic cleavage does not affect F clustering

      -  Mutations that have been reported to affect the hexamer interface reduce clustering on cells and its distribution on VLPs - - F nanoclusters are stabilized by AP

      Weaknesses:

      The relationship between F clustering and fusion is per se interesting, but looking at F clusters on the plasma membrane does not exclude that F clustering occurs for budding. Many viral glycoproteins cluster at the plasma membrane to generate micro domains for budding. 

      This does not exclude that these clusters include hexamer assemblies or clustering requires hexamer assemblies. 

      We thank reviewer #3 for this question. We did not focus on the role of NiV-F clusters for budding in the current manuscript, although this is an interesting topic to pursue. In this manuscript, we observed that NiV VLP budding is decreased for some cluster-disrupting mutants, such as F-YA, and F-LI4A. however, F-V108D showed increased budding compared to F-WT (Fig. 4A). We also observed that VLPs and VSV/NiV pseudoviruses expressing L53D have little NiV-G (Fig. 4A, Fig. S4F and S4H), although the incorporation level of L53D is comparable to that of wt F in both VLPs and pseudovirions (Fig. 4A and Fig. S4F). L53D is a hypofusogenic mutant with decreased clustering ability. Therefore, our current data do not show a clear link between F clustering and NiV VLP budding or glycoprotein incorporation. 

      We reported that both NiV-F and -M form clusters at the plasma membrane although NiV-F clusters are not enriched at the NiV-M positive membrane domains 1. This result indicates that NiV-M is the major driving force for assembly and budding, while NiV-F is passively incorporated into the assembly sites. The central role of NiV-M in budding is also supported by a recent study showing that NiV-M induces membrane curvature by binding to PI(4,5)P2 in the inner leaflet of the plasma membrane 28. However, the expression of NiV-F alone induces the production of vesicles bearing NiV-F 29 and NiV-F recruits vesicular trafficking and actin cytoskeleton factors to VLPs either alone or in combination with NiV-G and -M, indicating a potential autonomous role in budding 30. Additionally, several electron microscopy studies show that the paramyxovirus F forms 2D lattice interspersed above the M lattice, suggesting the participation of F in virus assembly and budding. Nonetheless, the evidence above suggests that NiV-F may play a role in budding, but our data cannot correlate NiV-F clustering to budding. 

      Assuming that the clusters are important for entry, hexameric clusters are not unique to Nipah virus F. Similar hexameric clusters have been described for the HEF on influenza virus C particles (Halldorsson et al 2021) and env organization on Foamy virus particles (Effantin et al 2016), both with specific interactions between trimers. What is the organization of F on Nipah virus particles? If F requires to be hexameric for entry, this should be easily imaged by EM on infectious or inactivated virus particles. 

      We thank reviewer #3 for this suggestion. The hexamer-of-trimer NiV-F is observed on the VLP surface by electron tomography 4. The NiV-F hexamer-of-trimers are arranged into a soccer ball-like structure, with one trimer being part of multiple hexamer-of-trimers. The implication of NiV-F clusters in virus entry and the potential mechanism for NiV-F higherorder structure formation are discussed in the revised manuscripts. 

      AP stabilization of the F clusters is curious if the clusters are solely required for entry? Virus entry does not recruit the clathrin machinery. Is it possible that F clusters are endocytosed in the absence of budding? 

      We thank reviewer #3 for this question. The evidence from the current study does not exclude the role of NiV-F clustering in virus budding. NiV-F is known to be endocytosed in the virus-producing cells for cleavage by Cathepsin B or L at endocytic compartments at a pH-dependent manner31–33 in the absence of budding. However, given that all cleaved and uncleaved NiV-F have an endocytosis signal sequence at the cytoplasmic tail and are able to interact with AP-2 for endosome assembly and the cleaved and uncleaved F may have similar clustering patterns (Fig. 2), we do not think NiV-F clustering is specifically regulated for the cleavage of NiV-F. A plausible hypothesis is that NiV-F clusters are stabilized by multiple intrinsic factors (e.g. trimer interface) and host factors (e.g. AP-2) on cell membrane for cell-cell fusion and virus budding. We linked the clustering to the fusion ability of NiV-F in this study, but the NiV-F clustering may also be important in facilitating virus budding. Once in the viruses, the higher-order assembly of the clusters (e.g. lattice) may form due to protein enrichment, and the cell factors may not be the major maintenance force. 

      Clusters are required for budding. 

      Other points:

      Fig. 3: Some of the V108D and L53D clusters look similar in size than wt clusters. It seems that the interaction is important but not absolutely essential. Would a double mutant abrogate clustering completely?

      We thank Reviewer #3 for the suggestion. We generated a double mutant of NIV-F with L53D and V108D (NiV-F-LV) and assessed its expression and processing. Although the mutant retained processing capability, it exhibited minimal surface expression, making it unfeasible to analyze its nano-organization on the cell or viral membrane.

      Author response image 4.

      The expression and fusion activity of Flag-tagged NiV-F and NiV-F L53D-V108D (LV). (A) Representative western blot analysis of NiV-F-WT, LV in the cell lysate of 293T cells. 293T cells were transfected by NiV-F-WT or the LV mutant. The empty vector was used as a negative control. The cell lysates were analyzed on SDS-PAGE followed by western blotting after 28hrs post-transfection. F0 and F2 were probed by the M2 monoclonal mouse antiFLAG antibody. GAPDH was probed by monoclonal mouse anti-GAPDH. (B) Representative images of 293T cell-cell fusion induced by NiV-G and NiV-F-WT or NiV-F-LV. 293T cells were co-transfected with plasmids coding for NiV-G and empty vector (NC) or NiV-F constructs. Cells were fixed at 18 hrs post-transfection. Arrows point to syncytia. Scale bar: 10um. (C) Relative cell-cell fusion levels in 293T cells in (B). Five fields per experiment were counted from three independent experiments. Data are presented as mean ± SEM. (D) The cell surface expression levels of NiV-F-WT, NiV-F-LV in 293T cells measured by flow cytometry. Mean fluorescence Intensity (MFI) values were calculated by FlowJo and normalized to that of F-WT. Data are presented as mean ± SEM of three independent experiments. Statistical significance was determined by the unpaired t-test with Welch’s correction (*P<0.05, **P<0.01, ***P<0.001, ****P<0.0001). Values were compared to that of the NiV-F-WT.

      Fig. 4: The distribution of F on VLPs should be confirmed by cryoEM analyses. This would also confirm the symmetry of the clusters. The manuscript by Chernomordik et al. JBC 2004 showed that influenza HA outside the direct contact zone affects fusion, which could be further elaborated in the context of F clusters and the fusion mechanism.

      We thank reviewer 3 for this suggestion. The distribution of F on VLPs was resolved by electron tomogram which showed that the NiV-F hexamer-of-trimers are arranged into a soccer ball-like structure 4. The role of influenza HA outside of the contact zone in fusion activation is an interesting phenomenon. It may address the energy transmission within and among clusters. We will pursue this topic in a future project.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      •  Please define all used abbreviations throughout the manuscript and in the SI.

      We defined the abbreviations at their first usage. 

      •  The sentence starting with "Additionally, ..." on line 155 appears to be incomplete.

      We corrected this sentence.  

      •  The statement starting with "As reported, ..." on line 181 should be supported by a reference.

      We added a reference. 

      •  In Fig. 4C, it is unclear what the x and y axes represent.  

      Fig. 4C is a t-SNE plot for visualizing high-dimensional data in a low-dimensional space. It maintains the local data structure but does not represent exact quantitative relationships. In other words, points that are close together in Fig. 4C are also close in the high-dimensional space, meaning the OPTICS plots, which reflect the clustering patterns, are similar for two points that are positioned near each other in Fig. 4C. Therefore, the x and y axes do not represent the original, quantitative data, and thus the axis titles are meaningless.  

      •  The reference on line 306 appears to be unformatted.

      We reformatted the reference.  

      Reviewer #2 (Recommendations For The Authors):

      The authors need to include the overall statistics for each experiment (at least 30 to 50 cells with three independent experiments are needed). 

      We highlighted the sample size (number of ROI and number of cells) used for analysis in the figure legend. The determination of the sample size is justified in Table 1 in the response letter. 

      The authors need to generate a functional pseudovirus system (for example HIVpp/NiV F) to run both infectivity and fusion experiments (including Apr-BlaM assay). 

      We tested viral entry using a VSV/NiV pseudovirus system and the viral entry kinetics using VLPs expressing NiV-M-β-lactamase. The results are presented in Fig. S1, S4, S6, and S7.  

      Reviewer #3 (Recommendations For The Authors):

      Even low resolution EM data on VLPs or viruses would strengthen the conclusions.

      We thank this reviewer for the suggestion. We cited the NiV VLP images acquired by electron tomography 4, but we currently have limited resources to perform cryoEM on NiV VLPs.  

      References.

      (1) Liu, Q., Chen, L., Aguilar, H. C. & Chou, K. C. A stochastic assembly model for Nipah virus revealed by super-resolution microscopy. Nature Communications 9, 3050 (2018).

      (2) Khetawat, D. & Broder, C. C. A Functional Henipavirus Envelope Glycoprotein Pseudotyped Lentivirus Assay System. Virology Journal 7, 312 (2010).

      (3) Palomares, K. et al. Nipah Virus Envelope-Pseudotyped Lentiviruses Efficiently Target ephrinB2Positive Stem Cell Populations In Vitro and Bypass the Liver Sink When Administered In Vivo. J Virol 87, 2094–2108 (2013).

      (4) Xu, K. et al. Crystal Structure of the Pre-fusion Nipah Virus Fusion Glycoprotein Reveals a Novel Hexamer-of-Trimers Assembly. PLoS Pathog 11, e1005322 (2015).

      (5)    Bakker, E. & Swain, P. S. Estimating numbers of intracellular molecules through analysing fluctuations in photobleaching. Sci Rep 9, 15238 (2019).

      (6) Nayak, C. R. & Rutenberg, A. D. Quantification of Fluorophore Copy Number from Intrinsic

      Fluctuations during Fluorescence Photobleaching. Biophys J 101, 2284–2293 (2011).

      (7) Salavessa, L. & Sauvonnet, N. Stoichiometry of ReceptorsReceptors at the Plasma MembranePlasma membrane During Their EndocytosisEndocytosis Using Total Internal Reflection Fluorescent (TIRF) MicroscopyMicroscopy Live Imaging and Single-Molecule Tracking. in Exocytosis and Endocytosis: Methods and Protocols (eds. Niedergang, F., Vitale, N. & Gasman, S.) 3–17 (Springer US, New York, NY, 2021). doi:10.1007/978-1-0716-1044-2_1.

      (8) Slenders, E. et al. Confocal-based fluorescence fluctuation spectroscopy with a SPAD array detector. Light Sci Appl 10, 31 (2021).

      (9) Annibale, P., Vanni, S., Scarselli, M., Rothlisberger, U. & Radenovic, A. Identification of clustering artifacts in photoactivated localization microscopy. Nat Methods 8, 527–528 (2011).

      (10) Baumgart, F. et al. Varying label density allows artifact-free analysis of membrane-protein nanoclusters. Nat Methods 13, 661–664 (2016).

      (11) Zanacchi, F. C. et al. A DNA origami platform for quantifying protein copy number in super-resolution. Nat Methods 14, 789–792 (2017).

      (12) Jungmann, R. et al. Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT. Nature Methods 11, 313–318 (2014).

      (13) Rubin-Delanchy, P. et al. Bayesian cluster identification in single-molecule localization microscopy data. Nat Methods 12, 1072–1076 (2015).

      (14) Griffié, J. et al. 3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse. Sci Rep 7, 4077 (2017).

      (15) Dynamic Bayesian Cluster Analysis of Live-Cell Single Molecule Localization Microscopy Datasets - Griffié - 2018 - Small Methods - Wiley Online Library. https://onlinelibrary.wiley.com/doi/full/10.1002/smtd.201800008.

      (16) Caetano, F. A. et al. MIiSR: Molecular Interactions in Super-Resolution Imaging Enables the Analysis of Protein Interactions, Dynamics and Formation of Multi-protein Structures. PLOS Computational Biology 11, e1004634 (2015).

      (17) Malkusch, S. & Heilemann, M. Extracting quantitative information from single-molecule superresolution imaging data with LAMA – LocAlization Microscopy Analyzer. Sci Rep 6, 34486 (2016).

      (18) Zhang, Y., Lara-Tejero, M., Bewersdorf, J. & Galán, J. E. Visualization and characterization of individual type III protein secretion machines in live bacteria. Proceedings of the National Academy of Sciences 114, 6098–6103 (2017).

      (19) Tobin, S. J. et al. Single molecule localization microscopy coupled with touch preparation for the quantification of trastuzumab-bound HER2. Sci Rep 8, 15154 (2018).

      (20) Levet, F. et al. SR-Tesseler: a method to segment and quantify localization-based super-resolution microscopy data. Nature Methods 12, 1065–1071 (2015).

      (21) Peters, R., Griffié, J., Burn, G. L., Williamson, D. J. & Owen, D. M. Quantitative fibre analysis of singlemolecule localization microscopy data. Sci Rep 8, 10418 (2018).

      (22) Levet, F. et al. A tessellation-based colocalization analysis approach for single-molecule localization microscopy. Nat Commun 10, (2019).

      (23) Banerjee, C. et al. ULK1 forms distinct oligomeric states and nanoscopic structures during autophagy initiation. Science Advances 9, eadh4094 (2023).

      (24) Pageon, S. V. et al. Functional role of T-cell receptor nanoclusters in signal initiation and antigen discrimination. Proceedings of the National Academy of Sciences 113, E5454–E5463 (2016).

      (25) Cresens, C. et al. Flat clathrin lattices are linked to metastatic potential in colorectal cancer. iScience 26, 107327 (2023).

      (26) Seeling, M. et al. Immunoglobulin G-dependent inhibition of inflammatory bone remodeling requires pattern recognition receptor Dectin-1. Immunity 56, 1046-1063.e7 (2023).

      (27) Liu, Q. T. et al. The nanoscale organization of Nipah virus matrix protein revealed by super-resolution microscopy. Biophysical Journal 121, 2290–2296 (2022).

      (28) Norris, M. J. et al. Measles and Nipah virus assembly: Specific lipid binding drives matrix polymerization. Science Advances 8, eabn1440 (2022).

      (29) Patch, J. R. et al. The YPLGVG sequence of the Nipah virus matrix protein is required for budding. Virol. J. 5, 137 (2008).

      (30) Johnston, G. P. et al. Nipah Virus-Like Particle Egress Is Modulated by Cytoskeletal and Vesicular Trafficking Pathways: a Validated Particle Proteomics Analysis. mSystems 4, e00194-19 (2019).

      (31) Diederich, S. et al. Activation of the Nipah Virus Fusion Protein in MDCK Cells Is Mediated by Cathepsin B within the Endosome-Recycling Compartment. J Virol 86, 3736–3745 (2012).

      (32) Diederich, S., Thiel, L. & Maisner, A. Role of endocytosis and cathepsin-mediated activation in Nipah virus entry. Virology 375, 391–400 (2008).

      (33) Pager, C. T., Craft, W. W., Patch, J. & Dutch, R. E. A mature and fusogenic form of the Nipah virus fusion protein requires proteolytic processing by cathepsin L. Virology 346, 251–257 (2006).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      I have read the authors' response to my comments as well as to the other reviewers. Summarizing briefly, I don't think they provide substantial answer to the questions/comments by me or reviewer 3, and generally do not quantify the results/effects data. I still remain unconvinced about the analyses and conclusions. Rather than rewriting another set of comments, I think it will be more useful for all (authors and readers) simply to be able to see the entire set of reviews and responses together with the paper.

      The authors disagree with the views of referees. The authors have provided point-wise precise responses to each of the previous comments. The authors find that the referee has not been able to engage with the responses and accompanying analysis that were provided while communicating the previous response.

      The following extensive analyses were performed by the authors while submitting our revision of round 2 of peer-review to address the comments of reviewer 2 and reviewer 3   that were raised by them on the previous versions:

      (1) We calculated the distribution of multiple metrics for both the apo and holo simulations, including their secondary structure composition, and demonstrated the robustness of our findings.

      (2) We analyzed smaller 60 µs chunks from two parts of the 1.5 ms trajectory and showed how, in combination with the Markov state modeling (MSM) approach, these chunks effectively capture equilibrium properties.

      (3) We thoroughly investigated the choice of starting structures, examining parameters such as Rg, RMSD, secondary structure, and SASA, in response to Referee 3's concerns about the objectivity of our dimension reduction approach.

      (4) We conducted multiple analyses using VAMP-scores and justified the use of a Variational Autoencoder (VAE) over tICA.

      (5) We had extensively verified the choice of hyperparameters used in constructing the MSM.

      (6) To aleviate referee concerns, we had retrained a VAE with four latent dimensions and used it to build an MSM, ensuring the robustness of our approach.

      However, we find that Referee has not considered these additional analysis in response to his/her comments on the manuscript.

      Since referee 2 also draws comments from Referee 3, it is worth noting that some of the comments from Referee 2 and Referee 3 in Round 1 were mutually contradictory. In particular, Referee 3's suggestion in Round 1 to use the same initial configuration for simulations of intrinsically disordered proteins (IDPs) in both apo and ligand-bound forms contradicts the fundamental principle that IDPs should not possess structural bias. This recommendation also directly conflicts with Referee 2's request for greater diversity in starting structures. Our manuscript provided robust evidence that our initial configurations are indeed diverse, with one configuration coincidentally matching that used in the ligand-bound simulations. Despite this, we addressed both sets of concerns in our Round 2 revisions. Unfortunately, it seems that these efforts were overlooked in the subsequent round of review.

      Referee 2's suggestion in prevous round of review comments to mix both holo and apo simulation trajectories for MSM construction is conceptually wrong and indicates a lack of understanding of transition matrix building in this field. Nevertheless, we addressed these comments by performing additional analyses and demonstrating the robustness of our current MSM.

      Reviewer #3 (Public Review):

      Summary:

      While the authors have provided additional information in the updated manuscript, none of the additional analyses address the fundamental flaws of the manuscript.

      The additional analyses do not convincingly demonstrate that these two extremely different simulation datasets (1500 microsecond unbiased MD for a-synuclein + fasudil, 23 separate 1-4 microsecond simulations of apo a-synuclein) are directly comparable for the purposes of building MSMs.

      The 23 unbiased 1-4 microsecond simulations of apo αS totals to ~ 60 us.

      Author response image 1.

      Left figure : Distribution of the radius of gyration (Rg) of the 23 apo simulation (as shown in the colourbar) and holo simulation (black). Right figure : Mean and standard deviation (as error bar) of the Rg of the 23 apo (colourbar) and holo simulations (black).

      We have plotted the distribution of the Radius of gyration ((Rg) for the 23 apo simulation (colour bar) and the holo simulation (black) as shown in the left figure and also compared the mean and standard deviations of the Rg values (right figure). We find that our apo simulations span the entire space of Rg as is spanned by the holo simulation. We have also measured the mean and standard deviations (SD) (horizontal error bar) of the apo and holo simulations. The fact that the apo simulations have mean and SDs comparable to those of the holo ensemble suggests that the majority of the apo simulations are sampling similar conformational space as those observed in the ligand-bound holo form and hence can be used for building the MSM.

      The additional analyses do not demonstrate that there are sufficient conformational transitions among kinetically metastable states observed in 23 separate 1-4 microsecond simulations of apo a-synuclein to build a valid MSM, or that the latent space of the VAE is kinetically meaningful.      

      We have performed the Chapman-Kolmogorov test to compare observed and predicted transition probabilities over increasing lag times and found good agreement between these probabilities, thereby suggesting that transitions between states are well-sampled for both the apo (Author response image 2) and holo simulation (Figure S9).

      Author response image 2.

      The Chapman-Kolmogorov test performed for the three state Markov State Model of the αS ensemble.

      As for the latent space of VAE, we have compared the VAMP2 score and compared with tICA. VAE has a higher VAMP2 score as compared to tICA thereby indicating its efficacy in capturing slower mode for both apo and holo simulation (Fig. S7 and S8).

      If one is interested in modeling the kinetics and thermodynamics of transitions between a set of conformational states, and they run a small number of MD simulations that are too short to see conformational transitions between conformational states - any kinetics and thermodynamics modeled by an MSM will be inherently meaningless. This is likely to be the case with the apo asynuclein dataset analyzed in this investigation.

      We disagree with the referee’s view. The referee does not seem to understand the point of building Markov state models via short-time scale trajectories. The distribution of Rg of all the 23 apo simulations spans the entire Rg space sampled by the holo simulation, thereby suggesting that multiple short simulations can sample structures of varying sizes as sampled from the 1.5 ms holo simulation (see Author response image 1).

      Simulations of 1-4 microseconds are almost certainly far too short to see a meaningful sampling of conformational transitions of a highly entangled 140-residue IDP beyond a very local relaxation of the starting structures, and the authors provide no analyses to suggest otherwise.

      Author response image 3.

      Autocorrelation of the first principal component of the backbone dihedral for the apo (colourbar) and holo (black) simulation.

      Author response image 4.

      Autocorrelation of the second principal component of the backbone dihedral for the apo (colourbar) and holo (black) simulation.

      In order to assess the 23 short simulations in capturing meaningful kinetics and thermodynamics, we have computed the backbone dihedrals which were then reduced to two principal components for both the 23 apo and holo simulations. We then calculated the autocorrelation time for each of the components and for each of the apo and holo simulations which are plotted in Author response image 3 and Author response image 4 respectively.

      The autocorrelation for the holo and most of the apo simulation is similar, thereby suggesting that there is sufficient sampling of conformational transitions between conformational states in the apo simulations and are therefore able to represent the structural changes of the system similarly to the long simulation.

      Without convincingly demonstrating reasonable statistics of conformational changes from the very small apo simulation dataset analyzed here, it seems highly likely the apparent validity of the apo MSM results from learning a VAE latent space that groups structurally and kinetically distinct conformations into similar states, creating the spurious appearance of transitions between states. As such, the kinetics and thermodynamics of the resulting MSM are likely to be relatively meaningless, and comparisons with an MSM for a-synuclein in the presence of fasudil are likely to be meaningless.

      We have shown above that the short simulations are able to capture the structural changes in the long simulation. In addition we have compared the VAMP2 score of the apo and holo simulation with tICA and found out that VAE is superior in capturing long timescale dynamics, for both apo and holo simulation (Fig. S7 and S8).

      In its present form, this study provides an example of how the use of black-box machine learning methods to analyze molecular simulations can lead to obtaining misleading results (such as the appearance of a valid MSM) - when more basic analyses are omitted.

      The authors disagree with the referee’s viewpoint on our manuscript. We find that the majority of the contents of the referee’s comments are cursory and lack objectivity.

      The referee’s loose reference on Machine learning as a black box lacks basic knowledge to comprehend artificial deep neutral network’s long-proven ability to objectively deduce optimal set of lower-dimensional representation of conformational subspace of complex biomacromolecule. The referee’s views on the manuscript ignore the extensive optimization of hyper-parameters that were carried out by the authors in developing the suitable framework of beta-variational autoencoder for deducing optimal latent space representation of complex and fuzzy conformational  landscape of an IDP such as alpha-synuclein. We had thoroughly investigated the choice of starting structures, examining parameters such as Rg, RMSD, secondary structure, and SASA, in response to Referee 3's concerns about the objectivity of our dimension reduction approach. However, we find that referee 3 has ignored the analysis provided to justify our choice.

      Referee 3's advocacy for linear dimensional reduction techniques overlooks the necessity and generality of non-linear approaches, as enabled by artificial deep neural network frameworks, demonstrated in the present manuscript. Nevertheless, our manuscript includes evidence demonstrating the optimality of our current reduced dimensions through varied dimensional analyses. Our extensive analysis, based on the VAMP-2 score, supports the sufficiency of the present dimensions compared to other linear reduction methods.

      The referee’s views that developing Markov state models (MSM) of apo form of the alphasynulclein using multiple number of 1-4 microsecond long simulation length is misleading, suggests referee’s lack of knowledge on the fundamental purpose and motivation for the usage of MSM, which is, to derive long-time scale equilibrium properties from significantly short-length adaptively sampled trajectories. The referee has overlooked the extensive analysis that the authors had provided while demonstrating that the Markov state models developed from short length simulation trajectories of alpha-synclein can statistically replicate the properties derived from very long trajectories.

      ---

      The following is the authors’ response to the original reviews.

      The following extensive analyses were performed to address the reviewer comments:

      (1) We have calculated the distribution of radius of gyration (Rg), end-to-end distance (Ree), solvent accessible surface area (SASA)  of the apo and holo simulations and also their secondary structure composition.

      (2) We have performed a similar analysis for the smaller 60 µs chunk from two parts of the 1.5 ms trajectory.

      (3) The choice of starting structures have been thoroughly investigated in terms of Rg, RMSD, secondary structure and SASA.

      (4) We have justified the use of VAE over tICA.

      (5) We have verified the choice of hyperparameters that were used to build the MSM.

      (6) We have retrained a VAE with four latent dimensions and used it to build MSM. 

      (7) As per recommendation of the referee 1, we have updated the title of the manuscript by introducing ‘expansion’ phrase.

      The manuscript has been accordingly revised by updating it with additional analysis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a well-conducted study about the mechanism of binding of a small molecule (fasudil) to a disordered protein (alpha-synuclein). Since this type of interaction has puzzled researchers for the last two decades, the results presented are welcome as they offer relevant insight into the physical principles underlying this interaction.

      Strengths:

      The results show convincingly that the mechanism of entropic expansion can explain the previously reported binding of fasudil to alpha-synuclein. In this context, the analysis of the changes in the entropy of the protein and of water is highly relevant. The combination use of machine learning for dimensional reduction and of Markov State Models could become a general procedure for the analysis of other systems where a compound binds a disordered protein.

      Weaknesses:

      It would be important to underscore the computational nature of the results, since the experimental evidence that fasudil binds alpha-synuclein is not entirely clear, at least to my knowledge.

      The experimental evidence of binding of fasudil to α-synuclein and potentially preventing its aggregation is reported in the paper “Fasudil attenuates aggregation of α-synuclein in models of Parkinson’s disease. Tatenhorst et al. Acta Neuropathologica Communications (2016) 4:39 DOI 10.1186/s40478-016-0310-y ”. In this work, solution state 15N-1H HSQC NMR experiments were performed of α-synuclein in increasing amounts of fasudil which led to large chemical shift perturbation of Y133 and Y136 residues. Additionally single and double mutant  synT-Y133A and synT-Y136A (tyrosine is replaced with alanine), when treated with fasudil, had no significant effect as evident from immunochemistry, thereby indicating that α-synuclein aggregation can be inhibited by the interaction of C-terminal tyrosines with  fasudil. These two analyses point to binding specific binding sites of fasudil to α-synuclein.

      In our work, we have built a MSM using the latent dimension of a deep learning method called VAE,  to address how fasudil interacts with α-synuclein. An analysis of the macrostates as obtained from MSM, gives insights into how fasudil interacts with α-synuclein, in terms of  transition probabilities among the states, thereby predicting which states are most favorable for binding.

      Reviewer #2 (Public Review):

      The manuscript by Menon et al describes a set of simulations of alpha-Synuclein (aSYN) and analyses of these and previous simulations in the presence of a small molecule.

      While I agree with the authors that the questions addressed are interesting, I am not sure how much we learn from the present simulations and analyses. In parts, the manuscript reads more like an attempt to apply a whole range of tools rather than with a goal of answering any specific questions.

      In this manuscript, we have employed a variational bayesian method, VAE, that uses variational inference to approximate the distribution of latent variable. Unlike conventional linear dimension reduction methods such as tICA (as provided in the SI), this method has been found to be better (higher VAMP2 score) in capturing slow modes and thereby facilitate the study of long-time dynamics. Markov State Model was built on this lower dimension space which indicated the presence of three and six states for the apo and holo simulations respectively. The exclusivity of the states was justified by determining the backbone contact map and further mapping these states using a denoising CNN-VAE. The increase in the number of states in the presence of the small molecule was justified by calculating the entropy of the macrostates. The entropic contribution from water remained similar across all states, while for the protein in the holo ensemble, entropy was significantly modulated (either increased or decreased) compared to the apo state. In contrast, the entropy of the apo states showed much less modulation. This proves that an increase in the number of states is primarily an entropic effect caused by the small molecule. Finally we have compared the mean first passage time (MFPT) of other states to the most populated state, which reveals a strong correlation between transition time and the system's entropy for both apo and holo ensemble. However, the transition times (to the most populated state) are much lower for the holo ensemble, thereby suggesting that fasudil may potentially trap the protein conformations in the intermediate states, thereby slowing down αS in exploring the large conformational space and eventually slow down aggregation.

      There's a lot going on in this paper, and I am not sure it is useful for the authors, readers or me to spell out all of my comments in detail. But here are at least some points that I found confusing/etc

      Major concerns

      p. 5 and elsewhere:

      I lack a serious discussion of convergence and the statistics of the differences between the two sets of simulations. On p. 5 it is described how the authors ran multiple simulations of the ligandfree system for a total of 62 µs; that is about 25 times less than for the ligand system. I acknowledge that running 1.5 ms is unfeasible, but at a bare minimum the authors should discuss and analyse the consequences for the relatively small amount of sampling. Here it is important to say that while 62 µs may sound like a lot it is probably not enough to sample the relevant properties of a 140-residue long disordered protein.

      As to referee 2’s original comment on ‘a lot going on in the manuscript’, we believe that the complexity of the project demanded that this work needs to be dealt with an extensive analysis and objective machine learning approaches, instead of routine collective variable or traditional linear dimensional reduction techniques. This is what has been accomplished in this manuscript. For someone to get the gist of the work, the last paragraph of the introduction and first paragraph of conclusion provides a summary of the overall finding and investigation in the manuscript. First, a VAE-based machine learning approach demonstrates the modulation of free energy landscape of alpha-synuclein in presence of fasudil. Next, Markov State Model elucidates distinct binding competing states of alpha-synuclein in presence of the small-molecule drug. Then the MSMderived metastable states of alpha-synuclein monomer are structurally characterized in presence of fasudil. Next we mapped the macrostates in apo and bound-state ensembles using denoising convolutional variational autoencoder, to ensure that these are mutually distinct. Next we show that fasudil exhibits conformation-dependent interactions with individual metastable states. Finally the investigation quantatively brings out entropic signatures of small molecule binding.

      We thank the reviewer for the question. For the apo simulations, we performed 1-4 μs long simulations with 23 different starting structures and the ensemble amounted to an ensemble of ~62 μs. In the Supplementary figures,  we show analyses of how the starting structures used for apo simulations compare with the structure used to run the holo simulations as well as comparison of the apo and holo ensembles in terms of structures features as Rg, Ree, solvent accessible surface area (SASA) and secondary structure properties. This is updated in the manuscript on page 3,31- 33 and figures S1-S6, S25-S30.

      Also, regarding the choice of starting structures, we chose multiple distinct conformations from a previous simulation of alpha synuclein monomer, reported in Robustelli et. al, PNAS, 115 (21), E4758-E4766. The Rg of the starting structures represent the entire distribution of Rg of the holo ensemble; from compact, intermediate to extended states. Importantly, the Rg distribution of the apo and holo ensembles are highly comparable and overlapping, indicating that the apo simulations, although of short timescale, have sampled the phase space locally around each starting conformation and thus covered the protein phase space as in the holo simulation. Similarly, other structural properties such as SASA, Ree  and secondary structure are comparable for the two ensembles. These analyses show that the local sampling across a variety of starting conformations has ensured sufficient sampling of the IDP phase space. This is  updated in the manuscript on page 33-34 and figure S1, S25-S30.

      p. 7:

      The authors make it sound like a bad thing than some methods are deterministic. Why is that the case? What kind of uncertainty in the data do they mean? One can certainly have deterministic methods and still deal with uncertainty. Again, this seems like a somewhat ad hoc argument for the choice of the method used.

      We appreciate the reviewer’s comment. In this work, we have used a single VAE model to map the simulation of αS in its apo state and in the presence of fasudil, into two dimensions. If we had used an autoencoder, which is a deterministic model, we would have to train two independent models; one for the apo-state and one for fasudil. It would then be questionable to compare the two dimensions obtained from two different autoencoders as the model parameters are not shared. 

      VAE gives us this flexibility by not mapping it to a single point, but to a distribution, thereby encouraging it to learn more generalizable representation. The uncertainty is not in the data; but mapping a conformation (of the fasudil simulation) to a distribution would provide a new point for a similar structure (from the apo simulation). 

      p. 8:

      The authors should make it clear (i) what the reconstruction loss and KL is calculated over and (ii) what the RMSD is calculated over.

      (i) The reconstruction loss is calculated between the reconstructed and original pairwise distances, whereas the KL loss is calculated between the approximated posterior distribution and the prior distribution (for VAE it is a standard normal distribution)

      (ii) The RMSE is the root mean square error between the original data and the reconstructed data. 

      (i) is updated on page 34 and (ii) is updated in the revised manuscript on page 8.

      p. 9/figure 1:

      The authors select a beta value that may be the minimum, but then is just below a big jump in the cross-validation error. Why does the error jump so much and isn't it slightly dangerous to pick a value close to such a large jump.

      In this work, RMSE has been chosen as a metric to select the best VAE model. To do so, the β parameter (weighting factor for the KL loss) was varied. The β value was chosen as this had the minimum value.

      This is updated on page 8.

      p. 10:

      Why was a 2-dimensional representation used in the VAE? What evidence do the authors have that the representation is meaningful? The authors state "The free energy landscape represents a large number of spatially close local minima representative of energetically competitive conformations inherent in αS" but they do not say what they mean by "spatially close". In the original space? If so, where is the evidence.

      We thank the reviewer for the question. Even though an increase in the number of latent dimensions may make the model more accurate, this can also result in overfitting. The model can simply memorize the pattern in the data instead of generalizing them. A higher dimensional latent space is also more difficult to interpret; therefore, we chose two dimensions. 

      The reconstruction loss (which is the mean squared error between the input and the reconstructed data) is of the order of 10-4. Also, the MSM built on the latent space of VAE is able to identify states that are distinct for both apo and holo simulations, which ensures that the latent space representation is meaningful.

      We have also trained a model with 4 neurons in the latent space and built an MSM. The implied timescales indicate the presence of six states which is consistent with the model with two latent dimensions.

      This is updated in the manuscript on page 13 and figure S14-S15.

      No, not spatially close in the original space, but in the reduced two dimensional latent space.

      p. 10:

      It is not clear from the text whether the VAEs are the same for both aSYN and aSYN-Fasudil. I assume they are. Given that the Fasudil dataset is 25x larger, presumably the VAE is mostly driven by that system. Is the VAE an equally good representation of both systems?

      Yes, the same model is used for both aSYN and aSYN-Fasudil ensemble.

      The states obtained from the MSM of the aSyn ensemble are distinct when their Cα contact maps are analyzed. So we think it is a good representation for this system.

      p. 10/11:

      Do the authors have any evidence that the latent space representation preserves relevant kinetic properties? This is a key point because the entire analysis is built on this. The choice of using z1 and z2 to build the MSM seems somewhat ad hoc. What does the auto-correlation functions of Z1 and Z2 look like? Are the related to dynamics of some key structural properties like Rg or transient helical structure.

      Autocorrelation of z1 and z2 of the latent space of VAE and the radius of gyration for asyn-fasudil simulation.

      Author response image 5.

      We find that z1 of VAE has a much slower decay as compared to Rg. This indicates that it is much better in capturing long-time-scale dynamics as compared to Rg.

      p. 11:

      What's the argument for not building an MSM with states shared for aSYN +- Fasudil?

      We have built two different markov state models for two aSYN simulation in its apo state and in the presence of ligand. Mixing the two latent spaces to build one MSM would give incorrect transition timescales among the states as these are independent simulations.

      p. 12:

      Fig. 3b/c show quite clearly that the implied timescales are not converged at the chosen lag time (incidentally, it would have been useful with showing the timescales in physical time). The CK test is stated to be validated with "reasonable accuracy", though it is unclear what that means.

      We have mentioned the physical timescales in the main manuscript (Page no. 38), which is 36 and 32 ns for apo and holo simulations, respectively. We used “reasonable accuracy” in the context of the Chapman-Kolmogorov test. We note that for the ligand simulations, the estimated and predicted models are in excellent agreement as compared to some of the transitions in the apo state. This good agreement implies that the model has reached Markovianity and the timescales have converged. 

      The CK test is updated in the manuscript on page 12.

      p. 12:

      In Fig. 3d, what are the authors bootstrapping over? What are the errors if the authors analyse sampling noise (e.g. bootstrap over simulation blocks)?

      For bootstrapping, we randomly deleted a part of the simulation (simulation block) and rebuilt the MSM with this reduced dataset. We repeated this 10 times and reported the average value of the population and the transition timescales over the 10 iterations.  

      p. 13:

      I appreciate that the authors build an MSM using only a subset of the fasudil simulations. Here, it would be important that this analysis includes the entire workflow so that the VAE is also rebuilt from scratch. Is that the case?

      The VAE model was trained over data points of the ligand simulation sampled at every 9 ns starting from time t=0, for the entire 1.5 ms. We did not train it for the subset of the fasudil simulation, but rather used the trained VAE model to get the latent space of the 60 μs of the fasudil simulation to build the MSM. Additionally, we have compared the distributions of Rg for this simulation block with the apo ensemble and found good agreement among them. 

      Rg distribution is updated in the manuscript on page 13 and see figure S10-S11.

      p. 18:

      I don't understand the goal of building the CVAE and DCVAE. Am I correct that the authors are building a complex ML model using only 3/6 input images? What is the goal of this analysis. As it stands, it reads a bit like simply wanting to apply some ML method to the data. Incidentally, the table in Fig. 6C is somewhat intransparent.

      We appreciate the reviewer’s valid question. The ensemble averaged contact map of the macrostates of aSyn in apo state and in the presence of ligand posed us a challenge in finding contacts that are exclusive to each state. Since VAEs are excellent in finding patterns, we employed a convolutional VAE (typically used for images). However, owing to the few number of contact maps, the model overfitted and to prevent this, we added noise to the data.  A visual inspection of the ensemble averaged contact map, especially for IDPs is difficult and this lower dimensional space will give us a preliminary idea of how each macrostate is different from every other. The table in Fig. 6C provides scores for the denoised contact maps (SSIM and PSNR scores). An SSIM score above 0.9 and PSNR score between 20-48 indicates that the reconstruction of the contact map is of good quality.

      p. 22:

      "Our results indicate that the interaction of fasudil with αS residues governs the structural features of the protein."

      What results indicate this?

      By building a Markov State Model and comparing them across the apo and holo ensembles, we showed the interaction of fasudil with aSyn leads to the population of more states (than apo). In these states, we observe that fasudil interacts with aSyn in different regions as shown by the protein-ligand contact map as shown in figure 7. Also, the contact maps and the extent of secondary structure of the six states are distinct across the states. The location and extent of the helix and sheet-like character in the ensemble of the six macrostates as shown in figure S16-S17.  Based on these observations, we state that the interaction of the small molecule favors the population of new aSyn states that are distinct in their structural features.

      p. 23:

      The authors should add some (realistic) errors to the entropy values quoted. Fig. 8 have some error bars, though they seem unrealistically small. Also, is the water value quoted from the same force field and conditions as for the simulations?

      The error values are the standard deviations that are provided by the PDB2ENTROPY package. Yes, the water value is from the same force field and conditions for the simulations are the same as reported in the section “Entropy of water”  

      p. 23:

      Has PDB2ENTROPY been validated for use with disordered proteins?

      Yes, it has been used in the following paper studying liquid-liquid phase separation of an IDP. 

      This paper has also been cited in the manuscript (reference 66).

      “Thermodynamic forces from protein and water govern condensate formation of an intrinsically disordered protein domain” by Saumyak Mukherjee & Lars V. Schäfer, Nature Communications volume  14, Article number: 5892 (2023) https://doi.org/10.1038/s41467-023-41586-y

      p. 23/24:

      It would be useful to compare (i) the free energies of the states (from their populations), (ii) the entropies (as calculated) and (iii) the enthalpies (as calculated e.g. as the average force field energy). Do they match up?

      Our analysis stems from previous studies where enthalpy driven drug design has not led to significant advances in drug design, particularly for IDPs. In the presence of the drug/ligand, the protein may be able to explore a larger conformational space and hence an increase in the number of states accessible by the protein, which we found by building Markov State Model using the latent space of VAE. The entropy of the protein is calculated based on the torsional degrees of freedom relative to the random distribution (the protein with the most random configuration).

      p. 31:

      It is unclear which previous simulation the new aSYN simulations were launched from. What is the size of the box used?

      The starting conformations for the new aSYN simulations were randomly chosen from a previously reported 73 μs simulation in Robustelli et. al. (PNAS, 115 (21), E4758-E4766). 

      Box size for the 23 simulation has been added to the supplemental information in Table S1.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript Menon, Adhikari, and Mondal analyze explicit solvent molecular dynamics (MD) computer simulations of the intrinsically disordered protein (IDP) alpha-synuclein in the presence and absence of a small molecule ligand, Fasudil, previously demonstrated to bind alpha-synuclein by NMR spectroscopy without inducing folding into more ordered structures. In order to provide insight into the binding mechanism of Fasudil the authors analyze an unbiased 1500us MD simulation of alpha-synuclein in the presence of Fasudil previously reported by Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510). The authors compare this simulation to a very different set of apo simulations: 23 separate1-4us simulations of alphasynuclein seeded from different apo conformations taken from another previously reported by Robustelli et. al. (PNAS, 115 (21), E4758-E4766), for a total of ~62us.

      To analyze the conformational space of alpha-synuclein - the authors employ a variational autoencoder (VAE) to reduce the dimensionality of Ca-Ca pairwise distances to 2 dimensions, and use the latent space projection of the VAE to build Markov state Models. The authors utilize kmeans clustering to cluster the sampled states of alpha-synuclein in each condition into 180 microstates on the VAE latent space. They then coarse grain these 180 microstates into a 3macrostate model for apo alpha-synuclein and a 6-macrostate model for alpha-synuclein in the presence of fasudil using the PCCA+ course graining method. Few details are provided to explain the hyperparameters used for PCCA+ coarse graining and the rationale for selecting the final number of macrostates.

      The authors analyze the properties of each of the alpha-synuclein macrostates from their final MSMs - examining intramolecular contacts, secondary structure propensities, and in the case of alpha-synuclein:Fasudil holo simulations - the contact probabilities between Fasudil and alphasynuclein residues.

      The authors utilize an additional variational autoencoder (a denoising convolutional VAE) to compare denoised contact maps of each macrostate, and project onto an additional latent space. The authors conclude that their apo and holo simulations are sampling distinct regions of the conformational space of alpha-synuclein projected on the denoising convolutional VAE latent space.

      Finally, the authors calculate water entropy and protein conformational entropy for each microstate. To facilitate water entropy calculations - the author's take a single structure from each macrostate - and ran a 20ps simulation at a finer timestep (4 femtoseconds) using a previously published method (DoSPT), which computes thermodynamic properties of water from MD simulations using autocorrelation functions of water velocities. The authors report that water entropy calculated from these individual 20ps simulations is very similar.

      For each macrostate the authors compute protein conformational entropy using a previously published Maximum Information Spanning tree approach based on torsion angle distributions - and observe that the estimated protein conformational entropy is substantially more negative for the macrostates of the holo ensemble.

      The authors calculate mean first passage times from their Markov state models and report a strong correlation between the protein conformational entropy of each state and the mean first passage time from each state to the highest populated state.

      As the authors observe the conformational entropy estimated from macrostates of the holo alphasynuclein:Fasudil is greater than those estimated from macrostates of the apo holo alphasynuclein macrostates - they suggest that the driving force of Fasudil binding is an increase in the conformational entropy of alpha-synuclein. No consideration/quantification of the enthalpy of alpha-synuclein Fasudil binding is presented.

      Strengths:

      The author's utilize MD simulations run with an appropriate force field for IDPs (a99SB-disp and a99SB-disp water (Robustelli et. al, PNAS, 115 (21), E4758-E4766) - which has previously been used to perform MD simulations of alpha-synuclein that have been validated with extensive NMR data.

      The contact probability between Fasudil and each alpha-synuclein residue observed in the previously performed 1500us MD simulation of alpha-synuclein in the presence of Fasudil (Robustelli et. al., Journal of the American Chemical Society, 144(6), pp.2501-2510) was previously found to be in good agreement with experimental NMR chemical shift perturbations upon Fasudil binding - suggesting that this simulation is a reasonable choice for understanding IDP:small molecule interactions.

      Weaknesses:

      Major Weakness 1: Simulations of apo alpha-synuclein and holo simulations of alpha-synuclein and fasudil are not comparable.

      The most robust way to determine how presence of Fasudil affects the conformational ensemble of alpha-synuclein conclusions is to run apo and holo simulations of the same length from the same starting structures using the same simulation parameters.

      The 23 1-4 us independent simulations of apo alpha-synuclein and the long unbiased 1500us alpha-synuclein in the presence of fasudil are not directly comparable. The starting structures of simulations used to build a Markov state model to describe apo alpha-synuclein were taken from a previously reported 73us MD simulation of alpha-synuclein run with the a99SB-disp force field and water model) with 100mM NaCl, (Robustelli et. al, PNAS, 115 (21), E4758-E4766). As the holo simulation of alpha-synuclein and Fasudil was run in 50mM NaCl, snapshots from the original apo alpha-synuclein simulation were resolvated with 50mM NaCl - and new simulations were run.

      No justification is offered for how starting structures were selected. We have no sense of the conformational variability of the starting structures selected and no sense of how these conformations compare to the alpha-synuclein conformations sampled in the holo simulation in terms of standard structural descriptors such as tertiary contacts, secondary structure, radius of gyration (Rg), solvent exposed surface area etc. (we only see a comparison of projections on an uninterpretable non-linear latent-space and average contact maps). Additionally, 1-4 us is a relatively short timescale for a simulation of a 140 residue IDP- and one is unlikely to see substantial evolution for many structural properties of interest (ie. secondary structure, radius of gyration, tertiary contacts) in simulations this short. Without any information about the conformational space sample in the 23 apo simulations (aside from a projection on an uninterpretable latent space)- we have no way to determine if we observe transitions between distinct states in these short simulations, and therefore if it is possible the construct a meaningful MSM from these simulations.

      If the structures used for apo simulations are on average more compact or contain more tertiary contacts - then it is unsurprising that in short independent simulations they sample a smaller region of conformational space. Similarly, if the starting structures have similar dimensions - but we only observe extremely local sampling around starting structures in apo simulations in the short simulation times - it would also not be surprising that we sample a smaller amount of conformational space. By only presenting comparisons of conformational states on an uninformative VAE latent space - it is not possible for a reader to ask simple questions about how the conformational ensembles compare.

      It is noted that the authors attempt to address questions about sampling by building an MSM of single contiguous 60us portion of the holo simulation of alpha-synuclein and Fasudil - noting that:

      "the MSM built using lesser data (and same amount of data as in water) also indicated the presence of six states of alphaS in presence of fasudil, as was observed in the MSM of the full trajectory. Together, this exercise invalidates the sampling argument and suggests that the increase in the number of metastable macrostates of alphaS in fasudil solution relative to that in water is a direct outcome of the interaction of alphaS with the small molecule."

      However, the authors present no data to support this assertion - and readers have no sense of how the conformational space sampled in this portion of the trajectory compares to the conformational space sampled in the independent apo simulations or the full holo simulation. As the analyzed 60us portion of the holo trajectory may have no overlap with conformational space sampled in the independent apo simulations - it is unclear if this control provides any information. There is no quantification of the conformational entropy of the 6 states obtained from this portion of the holo trajectory or the full conformational space sampled. No information is presented to determine if we observe similar states in the shorter portion of the holo trajectory. Furthermore - as the authors provide almost no justification for the criteria used to select of the final number of macrostates for any of the MSMs reported in this work- and the number of macrostates is effectively a free parameter in the PCCA+ method, arriving at an MSM with 6 macrostates does not convey any information about the conformational entropy of alpha-synuclein in the presence or absence of ligands. Indeed - the implied timescale plot for 60us holo MSM (Figure S2) - shows that at least 10 processes are resolved in the 120 microstate model - and there is no information to provided explaining/justifying how a final 6-macrostate model was determined. The authors also do not project the conformations sampled in this sub- trajectory onto the latent space of the final VAE.

      One certainly expects that an MSM built with 1/20th of the simulation data should have substantial differences from an MSM built from the full trajectory - so failing additional information and hyperparameter justification - one wonders if the emergence of a 6-state model could be the direct result of hardcoded VAE and MSM construction hyperparameter choices.

      Required Controls For Supporting the Conclusions of the Study: The authors should initiate apo and holo simulations from the same starting structures - using the same simulation software and parameters. This could be done by adding a Fasudil ligand to the apo structures - or by removing the Fasudil ligand from a subset of holo structures. This would enable them to make apples-toapples comparisons about the effect of Fasudil on alpha-synuclein conformational space.

      Failing to add direct apples-to-apples comparisons, which would be required to truly support the studies conclusions, the authors should at least compare the conformational space sampled in the independent apo simulations and holo simulations using standard interpretable IDP order parameters (ie. Rg, end-to-end distance, secondary structure order parameters) and/or principal components from PCA or tICA obtained from the holo simulation. The authors should quantify the number of transitions observed between conformational states in their apo simulations. The authors could also perform more appropriate holo controls, without additional calculations, by taking batches of a similar number of short 1-4us segments of simulations used to compute the apo MSMs and examining how the parameters/macrostates of the holo MSMs vary with the input with random selections.

      In case of IDPs, one should not bias the simulation by starting from identical structures, as IDP does not have a defined structure and the starting configuration has little significance. It is the microenvironment that matters most. As for the choice of simulation software and parameters, we have used the same force field that was used in the holo simulation at the same temperature and same salt concentration. We have performed multiple independent simulations that have varying structural signatures such as Rg, SASA and secondary structure content. In fact, the starting structure for apo simulations covered the entire span of the Rg distribution of holo simulation, including the starting structure of the holo simulation. The simulations are unbiased w.r.t the starting structure. Although the fasudil simulation was run for 1.5 ms, we should also understand that it is difficult to run a millisecond range of simulation in reasonable time from a single starting structure. It is exactly for this reason that we start with different structures so that we do not bias ourselves and sample every possible conformation. 

      We have updated the manuscript on page 33-34 and figure S1, S25-S30.

      Considering the computational expense for simulating 1.5 ms timescale of a 140-residue IDP, we generated an ensemble from multiple short runs amounting to ~60 µs. The premise of this investigation is a widely popular method, Markov State Models (MSMs) that can be used to estimate long timescale kinetics and stationary populations of metastable states built from ensembles of short simulations. We have also demonstrated that comparable to the apo data, when we build an MSM for asyn-fasudil (holo) using 60 µs simulation block, the implied timescales (ITS) plot shows identical number of metastable states as for the 1.5 ms data.  

      An intrinsically disordered protein (IDP) is not represented by a fixed structure. Therefore, it would be most appropriate to run multiple simulations starting from different initial structures and simulate the local environment around those structures; thus generating an ensemble effectively sampling the phase space. Accordingly, for initiating the apo simulations, instead of biasing the initial structure (using the starting structure used for simulations with fasudil), we chose randomly 23 different conformations from the 73 µs long simulation of 𝛼-synuclein monomer reported in Robustelli et. al, PNAS, 115 (21), E4758-E4766.  Based on the reviewer’s comment on providing a justification for choice of the starting structures for apo simulations, we provide a compilation of figures below showing comparison of standard conformational properties of the chosen initial structures for apo simulations with the starting structure of the long holo simulation; we have also provided comparative analyses of the apo (~60 µs) and holo ensemble (1.5 ms) properties. 

      Figure S1 compares the Rg of the apo and holo ensembles of ~60 μs and 1.5 ms, respectively. The distributions are majorly overlapping, indicating that the apo ensemble is comparable to the holo ensemble, in terms of the extent of compaction of the conformations. In Figure 1, we have also marked the Rg values corresponding to the starting structures used to seed the apo simulations. It is evident that the 23 starting conformations chosen represent the whole range of the Rg space that is sampled in the holo ensemble. Therefore, while the apo simulations are relatively short (1-4 μs), the local sampling of these multiple starting conformations of variable compaction (Rg) ensures that the phase space is efficiently sampled and the resulting ensemble is comparable to the holo ensemble. Furthermore, the implementation of MSM on such an ensemble can be efficiently used to identify metastable states and the long timescale transitions happening between them

      Another property that is proportional to Rg is the end-to-end distance of the protein conformations. Figure S2 shows that the distribution of this property in the apo and holo ensembles are highly similar.

      Figure S3 depicts another fundamental structural descriptor i.e. solvent accessible surface area (SASA) that indicates the extent of folding and the exposure of the residues. The apo ensemble only shows a minimal shift in the distribution towards higher SASA values. The distributions of the two ensembles largely overlap. 

      In Figure S25, we have provided the root mean square deviation (RMSD) of the starting structures used in the apo simulations with the structure used to start the long simulation with fasudil. The RMSD values range from 1.6 to 3 nm, indicating that the starting structures used are highly variable. This is justifiable for IDPs since they are not identified by a single, fixed structure, but rather by an array of different conformations.  

      Figures S26-S28 show the fraction of the secondary structure elements i.e. helix, beta and coil in the starting structures of apo and holo simulations. All the conformations are mostly disordered in nature with the greatest extent of coil content. The helix content ranges from 3-10 % while sheet content varies from 3-15 % in the initial simulation structures. 

      Figures S4-s6 represent the residue-wise percentage of secondary structure elements (helix, beta and coil) in the apo and holo ensembles. It is evident that the extent of secondary structure is comparable in the two ensembles. 

      The above analyses comparing distributions of several structural features clearly indicate that the apo simulations we performed from different starting structures have effectively sampled the phase space as the single long simulation of the holo system.

      We have discussed the above in the manuscript: Computational Methods section, Page 33-34.

      The above VAMP score analyses (Figures S7 and S8has been now presented in the manuscript: Results and Discussion (Page 8)

      Building the MSM

      While building the MSM, we iteratively varied the hyperparameters to build a reasonable model. In this process, we explored different values of the number of clusters, maximum number of iterations, tolerance, stride, metric, seed, chunk size and initialization methods. There is no possible way to perform an optimization on the choice of the above hyperparameters using gradient descent methods, as no convergence would be guaranteed. The parameters were tuned carefully so that we get the best possible implied timescales of the system. The quality of the MSM was further validated using the Chapman-Kolmogorov (CK) test on a state-by-state basis i.e by considering the transitions between each pair of the metastable states. In addition, we have built the contact maps to show that the states are mutually exclusive. This is also justified by the latent space of denoising convolutional variational autoencoders.

      We have compared the conformational space in the independent apo and holo simulations for Rg, Ree, SASA and secondary structure. As for PCA/TICA, we have computed the VAMP-2 score for TICA and found out to be low as compared to VAE. In fact, neural networks have been shown previously as a better dimension reduction technique due to its non-linearity over linear methods such as PCA or TICA.

      Author response image 6.

      Distribution of (a)Rg, (b) Ree, (c) SASA and of the apo ensemble and a 60 μs slice of the holo simulation trajectory.  (d) ITS plot of the 60 μs chunk.

      First, someone familiar with MSM should understand that the basic philosophy of MSM is not the requirement of long simulation trajectories, which would defeat the purpose of its usage. Rather as motivated by Noe and coworkers in seminal PNAS (vol. 106, page 9011, year 2009) paper, MSM plays an important role in inferring long-time scale equilibrium properties by using significantly short-length scale non-equilibrium trajectories. 

      Considering the difference in the size of the ensembles in the apo and holo simulations, we verified how different is the MSM built using 60 μs slice of the data from the 1.5 ms holo simulation in terms of the number of metastable states identified by the model. For this, we considered 60 μs data beginning from 966 μs - 1026 μs. First, we compared the gross structural properties of these datasets. Author response image 6a-c compares the distributions of Rg, Ree and SASA. The distributions show that the apo and holo simulations are very similar with respect to these standard properties of protein conformations. 

      We built the MSM for this 60 μs data of the holo ensemble from the reduced data obtained from the same VAE model. We would like to clarify that the hyperparameters of the model are not hardcoded but rather carefully fine-tuned to obtain a good model that performs good kinetic discretization of the underlying macrostates. The implied timescale plot of this new MSM shows distinct timescales corresponding to six macrostates. This led us to conclude that the six-state model is robust despite the differences in the ensemble size. The implied timescale is shown in Author response image 6d.

      The above analyses in Author response image 6 are presented in Results and Discussion, Page 13. 

      Major Weakness 2: There is little justification of how the hyperparameters MSMs were selected. It is unclear if the results of the study depend on arbitrary hyperparameter selections such as the final number of macrostates in each model.

      It is unclear what criteria were used to determine the appropriate number of microstates and macrostates for each MSM. Most importantly - as all analyses of water entropy and conformational entropy are restricted to the final macrostates - the criteria used to select the final number of macrostates with the PCCA+ are extremely important to the results of the conclusions of the study. From examining the ITS plots in Figure 3 - it seems both MSMs show the same number of resolved processes (at least 11) - suggesting that a 10-state model could be apropraite for both systems. If one were to simply select a large number of macrostates for the 20x longer holo simulation - do these states converge to the same conformational entropy as the states seen in the short apo simulations? Is there some MSM quality metric used to determine what number of macrostates is more appropriate?

      Required Controls For Supporting the Conclusions of the Study: The authors should specify the criteria used to determine the appropriate number of microstates and macrostates for their MSMs and present controls that demonstrate that the conformational entropies calculated for their final states are not simply a function of the ratio of the number macrostates chosen to represent very disparate amounts of conformational sampling.

      VAMP-2 score was used to determine the number of microstates. We have calculated the VAMP2 score by varying the number of microstates, ranging from 10 to 220. We find that the VAMP-2 score has saturated at a higher number of microstates for both apo and holo simulations.

      The number of macrostates were determined by the gap between the lines of the Implied timescales plot followed by a CK test (shown in figure S1). Since we plotted the first 10 slowest timescales, the implied timescales show 10 timescales and this is not an indicator of the number of macrostates. The macrostates are separated by distinct gaps in the timescales and do not merge as seen beyond 5 timescales in the plot. The timescales, when leveled off and distinct, indicate that the system has well defined metastable states and the MSM is accurate in identifying the macrostates. We find this to be three and six for the apo and holo simulations from the corresponding implied timescales.

      The above is discussed in Computational Methods, Page 37-38.

      Major Weakness 3: The use of variational autoencoders (VAEs) obscures insights into the underlying conformational ensembles of apo and holo alpha-synuclein rather than providing new ones

      No rationale is offered for the selection of the VAE architecture or hyperparameters used to reduce the dimensionality of alpha-synuclein conformational space.

      It is not clear the VAEs employed in this study are providing any new insight into the conformational ensembles and binding mechanisms of Fasudil to alpha-synuclein, or if the underlying latent space of the VAEs are more informative or kinetically meaningful than standard linear dimensionality reduction techniques like PCA and tICA. The initial VAE is used to reduce the dimensionality of alpha-synuclein conformational ensembles to 2 degrees of freedom - but it is unclear if this projection is structurally or kinetically meaningful. It is not clear why the authors choice to use a 2-dimeinsional projection instead of a higher number of dimensions to build their MSMs. Can they produce a more kinetically and structurally meaningful model using a higher dimensional VAE latent space?

      Additionally - it is not clear what insights are provided by the Denoising Convolutional Variational Autoencoder. The authors appear to be noising-and-denoising the contact maps of each macrostate, and then projecting the denoised values onto a new latent space - and commenting that they are different. Does this provide additional insight that looking at the contact maps in Figures 4&5 does not? Is this more informative than examining the distribution of the Radii of gyration or the secondary structure propensities of each ensemble? It is not clear what insight this analysis adds to the manuscript.

      Suggested controls to improve the study: The authors should project interpretable IDP structural descriptors (ie. secondary structure, radius of gyration, secondary structure content, # of intramolecular contacts, # of intermolecular contacts between alpha-synuclein and Fasudil ) onto this latent space to illustrate if any of these properties are meaningful separated by the VAE projection. The authors should compare these projections, and MSMs built from these projections, to projections and MSMs built from projections using standard linear dimensionality projection techniques like PCA and tICA.

      We have already pointed out the IDP structural parameters for the first question.

      In case of VAE, the latent space captures the underlying pattern of the higher dimensional data. A non-linear projection using VAE has shown to have a higher VAMP-2 score over linear dimension reduction methods such as tICA. The latent space of VAE was then used to build the MSM, in order to get the macrostates and also the transition timescales among them. We can project the data onto a higher dimension, but the goal is to reduce it to lower dimensions where it will be easier to interpret. Higher number dimensions would also risk overfitting; and the model, instead of learning the pattern, it may simply memorize the data. The training and validation loss curve from VAE has reached the order of 10^-4 thereby indicating good reconstruction of the original data.

      As for dimension reduction using tICA, the VAMP-2 score confirms that our VAE model performs better than tICA. This manuscript uses deep neural networks to understand the structural and kinetic process of IDP and small molecule interaction. Dimension reduction using tICA would give different reaction coordinates and MSM built using the projected data of tICA will not be one-to one comparable with that obtained from VAE.

      We had to perform noising, as we had only 9 contact maps. This led to overfitting of the CVAE model. To overcome this problem, we have introduced white noise to our data, so as to prevent the model from overfitting. The objective of the DCVAE model was to see how distinct these contact maps are based on their locations on a lower dimensional space. A visual inspection of the ensemble averaged contact map, especially for IDPs is much more difficult as compared to folded proteins. So, even before computing the Rg, Ree, SASA or secondary structure, this lower dimensional space will give us a preliminary idea of how each macrostate is different from every other.

      As for the distribution of Rg, we have plotted it in Author response image 7. The residue-wise percentage secondary structure is plotted in figure S4-S6  for the holo and apo simulation respectively.

      Author response image 7.

      Distribution of radius of gyration for the three and six macrostates in the apo and holo simulation respectively.

      As for training a model with a higher number of latent dimensions, we have retrained a VAE model with four dimensions in the latent space. The loss was of the order of 10-4. We built a MSM with the appropriate number of microstates and found the presence of six macrostates as evident from the ITS plot as shown in Figure S14 and S15.

      This data is presented in Results and Discussion, Page 13

      Major Weakness 4: The MSMs produced in this study have large discrepancies with MSMs previously produced on the same dataset by the same authors that are not discussed.

      Previously - two of the authors of this manuscript (Menon and Mondal) authored a preprint titled "Small molecule modulates α-synuclein conformation and its oligomerization via Entropy Expansion" (https://www.biorxiv.org/content/10.1101/2022.10.20.513005v1.full) that analyzed the same 1500us holo simulation of alpha-synuclein binding Fasudil. In this study - they utilized the variational approach to Markov processes (VAMP) to build an MSM using a 1D order parameter as input (the radius of gyration), first discretizing the conformational space into 300 microstates before similarly building a 6 macrostate model. From examining the contact maps and secondary structure propensities of the holo MSMs from the current study and the previous study- some of the macrostates appear similar, however there appear to be orders of magnitude differences in the timescales of conformational transitions between the two models. The timescales of conformational transitions in the previous MSM are on the order of 10s of microseconds, while the timescales of transitions in this manuscript are 100s-1000s microseconds. In the previous manuscript, a 3 state MSM is built from an apo α-synuclein obtained from a continuous 73ms unbiased MD simulation of alpha-synuclein run at a different salt concentration (100mM) and an additional 33 ms of shorter simulations. The apo MSM from the previous study similarly reports very fast timescales of transitions between apo states (on the order ~1ms) - while the MSM reported in the current study (Figure 9) are on the order of 10s-100s of microseconds).

      These discrepancies raise further concerns that the properties of the MSMs built on these systems are extremely sensitive to the chosen projection methods and MSM modeling choices and hyperparameters, and that neither model may be an accurate description of the true underlying dynamics

      Suggestions to improve the study: The authors should discuss the discrepancies with the MSMs reported in their previous studies.

      In the previous preprint, the radius of gyration was used as the collective variable to build the MSM. In this manuscript, we have used a much more general collective variable, reduced pairwise distance using VAE. Firstly, the collective variables used to build the model in the two works are different. Secondly, for the 73 μs apo simulation in the previous manuscript, the salt concentration used was 100 mM, but in this work, we have used a salt concentration of 50 mM, same as the salt concentration used in the holo simulations. Since the two simulation conditions are different with respect to salt concentration, the conformational space sampled in these conditions will be different and this will be reflected in the nature/features of the metastable states and the associated transition kinetics. Thirdly, the lag time at which the MSM was built was 3.6 ns in the previous manuscript, whereas, in this work we have used 32 ns. This is already off by a factor of 10. So the order of timescales have also changed. Thus, changes in the collective variable and change in the lag time at which the system reaches Markovianity is different. Hence, the timescales of transition among the macrostates are also different. Because of these differences, it would not be correct to compare the results that we would get from the two investigations.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To highlight the role of the entropic expansion mechanism, I would suggest modifying the title to capture this result, for example: "An Integrated Machine Learning Approach Delineates an Entropic Expansion Mechanism for the Binding of a Small Molecule to α-Synuclein".

      We have changed the title as suggested by the reviewer.

      To my knowledge the binding of fasudil to alpha-synuclein has been shown in the simulations by Robustelli et al (JACS 2022), but the experimental evidence is less clear cut. If an experimental binding affinity and the effect on alpha-synuclein aggregation have been measured, they should be reported.

      Reviewer #2 (Recommendations For The Authors):

      We thank the reviewer for the careful evaluation of our manuscript and providing comments and questions that we have attempted to address and incorporate. 

      Minor

      Abstract:

      In "which is able to statistically distinguish fuzzy ensemble", what does the word "statistically" mean in this context? Do the authors present evidence that the two ensembles are statistically different, and if so in what ways?

      We have analyzed the apo and holo ensembles of aSyn using the framework of Markov State Models, which provides the stationary populations of the states that the model identifies. For this reason, we have used ‘which is able to statistically distinguish fuzzy ensemble’ as we compare and contrast the metastable states that we resolve using MSM. The MSM provides metastable states which are identified through statistical analysis of the transitions between states (transition probability matrix). We characterize their structural features to distinguish them which gives a meaningful interpretation of the fuzzy ensemble.

      Abstract:

      What does "entropic ordering" mean?

      We thank the reviewer for pointing this out. Here, we mean that the presence of the small molecule only affects the protein backbone entropy while the entropy of water is not affected in the simulations with fasudil. We will rewrite this more clearly in the abstract. 

      The changed sentence is as follows: 

      “A thermodynamic analysis indicates that small-molecule modulates the structural repertoire of αS by tuning protein backbone entropy, however the entropy of the water remains unperturbed.”

      Abstract:

      What does "offering insights into entropic modulation" mean?

      In this investigation, we first discretized the ensemble of a small-molecule binding/interacting with a disordered aSyn into the underlying metastable states, followed by characterisation of these identified states. As small molecule interactions can affect the overall entropy of the IDP, we estimated the said effect of fasudil binding on aSyn. We find that small molecule binding effect is manifested in the protein backbone entropy and the solvent entropy is not affected. Through this work, we highlight these insights into the modulatory effect that fasudil brings about in the entropy of the system (entropic modulation).

      p. 3/4:

      When the authors write "However, a routine comparison of monomeric αS ensemble... ensemble" it is unclear whether they are referring to previous work (they only cite a paper with simulations of "apo" aSYN, and if so which. Do they mean Ref 32? Also, the word "routine" sounds odd in this context.

      We thank the author for pointing this out. We compared the ensemble properties (such as the distributions of the radius of gyration, end-to-end distance, solvent accessible surface area, secondary structure properties) of ɑ-synuclein monomer that we generated in neat water and the ensemble of ɑ-synuclein in the presence of the small molecule fasudil that is reported in Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510).  We have now modified this sentence in the main manuscript as follows: (Page no 3)

      “However, comparison of the global and local structural features of the αS ensemble in neat water and that in the presence of fasudil [32] (see Figure S1-S6) did not indicate a significant difference that is a customary signature of the dynamic IDP ensemble.”

      p. 4:

      Regarding "Integrative approaches are therefore gaining importance in IDP studies", these kinds of integrative approaches have been used for 20 years for studies of IDPs (with increasing sophistication and success), so I think "gaining" is somewhat of a stretch.

      We thank the reviewer for this comment. We agree with the reviewer and have now changed this sentence  as follows:

      “Integrative approaches have been exploited in studying IDPs as well as small-molecule binding to IDPs.”

      p. 5:

      What does "large scale" mean in "This study showed no large-scale differences between the bound and unbound states of αS"? Do the authors mean substantially/significantly different, or differences on a large (length) scale?

      Here, we refer to the study of small molecule (fasudil) binding study to α-synclein reported in Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510). In this study, the authors report no substantial (“large scale”) differences in the conformational ensembles of αsynuclein in the bound and unbound states of fasudil such as the backbone conformation distributions. 

      p. 6:

      The authors write "In a clear departure from the classical view of ligand binding to a folded globular protein, the visual change in αS ensemble due to the presence of small molecule is not so strikingly apparent." I don't understand this. Normally, there is very little difference between apo and holo protein structures for folded proteins, so I don't understand the "in a clear departure" part. This seems like a strawman. Of course, for folded proteins one can generally see the ligand bound, but here the authors are talking about the protein.

      In case of folded proteins, the overall tertiary structure of the protein remains mostly the same upon binding of the ligand. Structural changes are localized in nature and primarily around the binding site. However, in case of ⍺Syn, binding of fasudil is transient and not as strong as seen for folded proteins. “Clear departure” refers to the fact that for ⍺Syn, binding of fasudil is more subtle and dispersed across the ensemble of conformations rather than localized changes as in case of folded proteins.

      p. 6:

      I don't think the term "data-agnostic" makes sense since these methods are based on data and also make some assumptions about how the data can/should be used.

      We have replaced this term with “model-agnostic”.

      p. 16:

      How are contacts defined; please add to caption.

      A contact is considered if the Cα atoms of two residues are within a distance of 8 Å of each other. We have updated the caption with this information in Figures 4 and 5.  

      p. 20:

      What do the authors mean by "non-specific interactions" in this context?

      The interactions of fasudil are predominantly with the negatively charged residues in the C-terminal region of ⍺Syn via charge-charge and π-stacking interactions (Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510)).

      In addition, in some metastable states that we identify, we also observe transient interactions with residues in the hydrophobic NAC region and N-terminal region. We refer to these transient interactions as “non-specific” interactions.

      p. 27:

      Are the axes of Fig. 9c/d z1 and z2?

      Yes. The axes are z1 and z2

      Smaller than minor

      Abstract:

      Rephrase "In particular, the presence of fasudil in milieu"

      We have rephrased the sentence as follows: 

      “In particular, the presence of fasudil in the solvent…”

      p. 4:

      What does the word "potentially" do in "ensemble of conformations potentially sampled"?

      Here, by potentially, we mean the various conformations that the protein can adopt, subject to the environmental conditions. 

      p. 10:

      "we trained a large array of inter-residue pairwise distances"

      The distances were not trained; please reformulate

      We have corrected this sentence as follows:  

      “We trained a VAE model using a large array of inter-residue pairwise distances.”

      p. 13:

      N/C-terminal -> terminus (or in the C-terminal region)

      We have made the changes in the manuscript at the required places. 

      p. 20:

      Precedent -> previous (?)

      We have made the change in the manuscript. 

      p. 30:

      As far as I understand, Anton does not use GPUs and does not run Desmond.

      We thank the reviewer for providing this information. We referred to the original paper of the ⍺syn-fasudil simulations (Robustelli et.al. (Journal of the American Chemical Society, 144(6), pp.2501-2510)). The authors have performed equilibration with GPU/Desmond and used Anton for production runs. We have modified this sentence as:

      We have modified this sentence as: 

      “A 1500 μs long all-atom MD simulation trajectory of αS monomer in aqueous fasudil solution was simulated by D. E. Shaw Research with the Anton supercomputer that is specially purposed for running long-time-scale simulations.” on page 31

      References : 

      (1) Schütte  C,  Fischer  A,  Huisinga  W,  Deuflhard  P  (1999)  A  direct  approach  to  conformational  dynamics  based  on  hybrid  monte  carlo. J  Comput  Phys 151:146–168

      (2) Chodera JD, Swope WC, Pitera JW, Dill KA (2006) Long-time protein folding dynamics from short-time molecular dynamics simulations.Multiscale  Model  Simul5:1214–1226.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      UGGTs are involved in the prevention of premature degradation for misfolded glycoproteins, by utilizing UGGT1-KO cells and a number of different ERAD substrates. They proposed a concept by which the fate of glycoproteins can be determined by a tug-of-war between UGGTs and EDEMs. 

      Strengths: 

      The authors provided a wealth of data to indicate that UGGT1 competes with EDEMs, which promotes the glycoprotein degradation. 

      Weaknesses: 

      NA 

      We appreciate your comment.

      Reviewer #2 (Public review): 

      In this study, Ninagawa et al., sheds light on UGGT's role in ER quality control of glycoproteins. By utilizing UGGT1/UGGT2 DKO , they demonstrate that several model misfolded glycoproteins undergo early degradation. One such substrate is ATF6alpha where its premature degradation hampers the cell's ability to mount an ER stress response. 

      This study convincingly demonstrates that many unstable misfolded glycoproteins undergo accelerated degradation without UGGTs. Also, this study provides evidence of a "tug of war" model involving UGGTs (pulling glycoproteins to being refolded) and EDEMs (pulling glycoproteins to ERAD). 

      The study explores the physiological role of UGGT, particularly examining the impact of ATF6α in UGGT knockout cells' stress response. The authors further investigate the physiological consequences of accelerated ATF6α degradation, convincingly demonstrating that cells are sensitive to ER stress in the absence of UGGTs and unable to mount an adequate ER stress response. 

      These findings offer significant new insights into the ERAD field, highlighting UGGT1 as a crucial component in maintaining ER protein homeostasis. This represents a major advancement in our understanding of the field. 

      Thank you very much for your comment.

      Reviewer #3 (Public review): 

      This valuable manuscript demonstrates the long-held prediction that the glycosyltransferase UGGT slows degradation of endoplasmic reticulum (ER)-associated degradation substrates through a mechanism involving re-glucosylation of asparaginelinked glycans following release from the calnexin/calreticulin lectins. The evidence supporting this conclusion is solid using genetically-deficient cell models and well established biochemical methods to monitor the degradation of trafficking-incompetent ER-associated degradation substrates, although this could be improved by better defining of the importance of UGGT in the secretion of trafficking competent substrates. This work will be of specific interest to those interested in mechanistic aspects of ER protein quality control and protein secretion. 

      The authors have attempted to address my comments from the previous round of review, although some issues still remain. For example, the authors indicate that it is difficult to assess how UGGT1 influences degradation of secretion competent proteins, but this is not the case. This can be easily followed using metabolic labeling experiments, where you would get both the population of protein secreted and degraded under different conditions. Thus, I still feel that addressing the impact of UGGT1 depletion on the ER quality control for secretion competent protein remains an important point that could be better addressed in this work. 

      We mainly focused on the impact of UGGT1 depletion on ERAD in this paper and intend to determine the impact of UGGT1 depletion on the ER quality control for secretion competent protein in the near future.

      Further, in the previous submission, the authors showed that UGGT2 depletion demonstrates a similar reduction of ATF6 activation to that observed for UGGT1 depletion, although UGGT2 depletion does not reduce ATF6 protein levels like what is observed upon UGGT1 depletion. In the revised manuscript, they largely remove the UGGT2 data and only highlight the UGGT1 depletion data. While they are somewhat careful in their discussion, the implication is that UGGT1 regulates ATF6 activity by controlling its stability. The fact that UGGT2 has a similar effect on activity, but not stability, indicates that these enzymes may have other roles not directly linked to ATF6 stability. It is important to include the UGGT2 data and explicitly highlight this point in the discussion. Its fine to state that figuring out this other function is outside the scope of this work but removing it does not seem appropriate.

      We have added the data of UGGT2-KO and UGGT-DKO cells to Figure 4 and discussed appropriately.

      As I mentioned in my previous review, I think that this work is interesting and addresses an important gap in experimental evidence supporting a previously asserted dogma in the field. I do think that the authors would be better suited for highlighting the limitations of the study, as discussed above. Ultimately, though, this is an important addition to the literature. 

      We appreciate your comments. Thank you very much.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors): 

      I have carefully gone through the revised manuscript and responses to the reviewers' comments; I believe that the authors did a great job on revisions, and I do think that now this manuscript has been much improved (far easier to read through). Now I have only minor comments as follows; 

      Page 9: Lines 8-9; Comparison between WT and EDEM-TKO cells indicates that ATF6alpha is still degraded via gpERAD requiring mannose trimming even in the presence of DNJ (Fig. 1D). (it would be better to indicate which figure to look) 

      We have fixed it.

      Page 10: Lines 9-11; as multiple higher molecular weight bands (representing a mixture of G3M9, G2M9m and GM9 etc.) in WT cells treated with CST -> I am NOT AT ALL convinced with this statement on Figure 1-figure supplement 6A). How can the subtle glycan structure difference cause the ladder of the band? And if it is indeed the case (which I frankly doubt by the way), will endo-alpha-mannosidase treatment end up with a single band for CST? And PNGase F digestion can cancel all size difference between samples (control, +DNJ and +CST)? 

      CD3d-DTM-HA is a small protein (~20 kDa) possessing three N-glycans. Clear increase in the level of GM9 in WT cells treated with DNJ (Figure 1-Figure supplement 5A) caused an upward band shift (Figure 1-Figure supplement 6A). Similarly, clear increase in the levels of GM9, G2M9, G3M9 in WT cells treated with CST (Figure 1-Figure supplement 6B) produced the ladder of the band (Figure 1-Figure supplement 6A).

      Crystal violet assay (new Fig 4G; Page 33); It said that, after treating cells with drug (Tg) for 4 hours, cells were spread on 24 well plates and cultured without Tg for 5 days. If incubated that long, I wonder that any compromised viability may have been canceled by growing cells (cells become confluent no matter what?). Am I missing something? Please clarify. 

      We employed a previously published method to determine ER stress sensitivity (Yamamoto et al., Dev. Cell, 2007). Although any compromised viability may have been canceled by growing cells, as suggested, we were able to detect the difference between WT and UGGT-KO cells.

      Figure 5D; why one of the three N-glycans is missing on the last protein?? 

      We have fixed it.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study on the role of FGF signaling in the induction of primitive streak-like cells (PS-LC) in human 2D-gastruloids. The authors use a previously characterized standard culture that generates a ring of PS-LCs (TBXT+) and correlate this with pERK staining. A requirement for FGF signaling in TBXT induction is demonstrated via pharmacological inhibition of MEK and FGFR activity. A second set of culture conditions (with no exogenous FGFs) suggests that endogenous FGFs are required for pERK and TBXT induction. The authors then characterize, via scRNA-seq, various components of the FGF pathway (genes for ligands, receptors, ERK regulators, and HSPG regulation). They go on to characterize the pFGFR1, receptor isoforms, and polarized localization of this receptor. Finally, they perform FGF4 inhibition and use a cell line with a limited FGF17 inactivation (heterozygous null) and show that loss of these FGFs reduces PS-LC and derivative cell types.

      Strengths:

      (1) As the authors point out, the role of FGF signaling in gastrulation is less well understood than other signaling pathways. Hence this is a valuable contribution to that field.

      (2) The FGF4 and FGF17 loss-of-function experiments in Figure 5 are very intriguing. This is especially so given the intriguing observation that these FGFs appear to be dominating in this model of human gastrulation, in contrast to what FGFs dominate in mice, chicks, and frogs.

      (3) In general this paper is valuable as a further development of the Human gastruloid system and the role of FGF signaling in the induction of PS-CLs. The wide net that the authors cast in characterizing the FGF ligand gene, receptor isoforms, and downstream components provides a foundation for future work. As the authors write near the beginning of the Discussion "Many questions remain."

      We thank the reviewer for these positive comments.

      Weaknesses:

      (1) FGFs are cell survival factors in various aspects of development. The authors fail to address cell death due to loss of FGF signaling in their experiments. For example, in Figure 1E (which requires statistical analysis) and 1G (the bottom FGFRi row), there appears to be a significant amount of cell loss. Is this due to cell death? The authors should address the question of whether the role of FGF/ERK signaling is to keep the cells alive.

      Indeed, FGF also strongly affects cell number and it is an interesting question to what extent this depends on ERK. Our manuscript focuses instead on the role of FGF/ERK signaling in cell fate patterning. However, as mentioned in our discussion, figure 1de show that doxycycline induced pERK leads to more TBXT+ cells than the control without restoring cell number, suggesting the role of FGF in controlling cell number is independent of the requirement for FGF/ERK in PS-LC differrentiation. Unpublished data below showing a MEK inhibitor dose response further supports this: low doses of MEKi are sufficient to inhibit differentiation without affecting cell number. To address the reviewer’s question we will include this data in the revised manuscript and perform several additional experiments to determine in more detail how cell death and proliferation depend on FGF.

      Author response image 1.

      MEK affects differentiation and cell number at different doses. a-c) control and MEKi (0.3uM) treated colonies with similar cell number but different TBXT expression. d-f) quantification of cell number per colonies (d), percentage of TBXT-positive cell per colony (e), and the distribution of pERK intensities for different doses of MEK inhibitor (f). N>6 colonies per condition. MEKi = PD0325901. Scalebar = 50 micron.

      (2) Regarding the sparse cells in 1G, is there a reduction in cell number only with FGFRi and not MEKi? Is this reproducible? Gattiglio et al (Development, 2023, PMID: 37530863) present data supporting a "community effect" in the FGF-induced mesoderm differentiation of mouse embryonic stem cells. Could a community effect be at play in this human system (especially given the images in the bottom row of 1G)? If the authors don't address this experimentally they should at least address the ideas in Gattoglio et al.

      Indeed, FGFRi reproducibly affects cell number more than MEKi, in line with the fact that pathways downstream of FGF other than MAPK/ERK (e.g. PI3K) play important roles in cell survival and growth. We think the lack of differentiation in MEKi and FGFRi in Fig.1g cannot be attributed to a loss of cells combined with a community effect. This is because without FGFRi or MEKi cells also differentiate to primitive streak at much lower densities than those shown, consistent with the data we show above in response to (1), which argue against a primarily indirect effect of FGF on PS-LC differentiation through cell density. In the context of directed differentiation (rather than 2D gastruloids), we will show this in a controlled manner by repeating the experiment in Fig.1g while adjusting cell seeding densities to obtain similar final cell densities in all three conditions. We will also include Gattoglio et al. in our revised discussion.

      (3) Do the FGF4 and FGF17 LOF experiments in Figure 5 affect cell numbers like FGFRi in Figure 1?

      It seems the effect on cell number is small but we will analyze this carefully and include it in the revised manuscript. A small effect would be consistent with our unpublished data below showing a near uniform proliferation rate. This in turn suggests that low levels of pERK in the center are sufficient to maintain proliferation there while the much higher pERK levels in the PS-LC ring (that we think depend on FGF4 and FGF17) do not signifcantly increase the proliferation rate (see Fig.1 in the manuscript for the pERK pattern). Thus, loss of high pERK in PS-LC ring while maintaining low pERK throughout would not be expected to have a major impact on cell number but would impact differentiation. In contrast, loss of all FGF signaling through FGFRi does dramatically affect cell number. This is again consistent with the data provided in response to (1) showing that ERK levels can be reduced to a point where PS-LC differentiation is lost without significantly affecting cell number. We will include the data below in the revised manuscript.

      Author response image 2.

      Why examine PS-LC induction only in FGF17 heterozygous cells and not homozygous FGF17 nulls?

      We were unable to obtain homozygous FGF17 nulls, it is not clear if there is a reason for this. We will try again and otherwise attempt to corroborate our findings with further knockdown data.

      (4) The idea that FGF8 plays a dominant role during gastrulation of other species but not humans is so intriguing it warrants deeper testing. The authors dismiss FGF8 because its mRNA "...levels always remained low." (line 363) as well as the data published in Zhai et al (PMID: 36517595) and Tyser et al (PMID: 34789876). But there are cases in mouse development where a gene was expressed at levels so low, that it might be dismissed, and yet LOF experiments revealed it played a role or even was required in a developmental process. The authors should consider FGF8 inhibition or inactivation to explore its potential role, despite its low levels of expression.

      We agree with the reviewer that FGF8 is worth investigating further and we will now pursue this.

      (5) Redundancy is a common feature in FGF genetics. What is the effect of inhibiting FGF4 in FGF17 LOF cells?

      We will attempt to do the experiment the reviewer suggests.

      (6) I suggest stating that the authors take more caution in describing FGF gradients. For example, in one Results heading they write "Endogenous FGF4 and FGF17 gradients underly the ERK activity pattern.", implying an FGF protein gradient. However, they only present data for FGF mRNA , not protein. This issue would be clarified if they used proper nomenclature for gene, mRNA (italics), and protein (no italics) throughout the paper.

      We will edit the paper to more clearly distinguish protein and mRNA.

      Reviewer #2 (Public review):

      Summary:

      The role of FGFs in embryonic development and stem cell differentiation has remained unclear due to its complexity. In this study, the authors utilized a 2D human stem cell-based gastrulation model to investigate the functions of FGFs. They discovered that FGF-dependent ERK activity is closely linked to the emergence of primitive streak cells. Importantly, this 2D model effectively illustrates the spatial distribution of key signaling effectors and receptors by correlating these markers with cell fate markers, such as T and ISL1. Through inhibition and loss-of-function studies, they further corroborated the needs of FGF ligands. Their data shows that FGFR1 is the primary receptor, and FGF2/4/17 are the key ligands for primitive streak development, which aligns with observations in primate embryos. Additional experiments revealed that the reduction of FGF4 and FGF17 decreases ERK activity.

      Strengths:

      This study provides comprehensive data and improves our understanding of the role of FGF signaling in primate primitive streak formation. The authors provide new insights related to the spatial localization of the key components of FGF signaling and attempt to reveal the temporal dynamics of the signal propagation and cell fate decision, which has been challenging.

      Weaknesses:

      Given the solid data, the work only partially clarifies the complex picture of FGF signaling, so details remain somewhat elusive. The findings lack a strong punchline, which may limit their broader impact.

      We thank this reviewer for their valuable feedback and the compliment on the solidity of our data. The punchline of our work is that FGF4- and FGF17-dependent ERK signaling plays a key role in human PS-LC differentiation, and that these are different FGFs than those thought to drive mouse gastrulation. A second key point is that like BMP and TGFβ signaling, FGF signaling is restricted to the basolateral sides of pluripotent stem cell colonies due to polarized receptor expression, which is crucial for understanding the response to exogenous ligands added to the cell medium. Indeed, many facets of FGF signaling remain to investigated in the future, such as how FGF regulates and is regulated by other signals, which we will dedicate a different manuscript to.

      Reviewer #3 (Public review):

      Jo and colleagues set out to investigate the origins and functions of localized FGF/ERK signaling for the differentiation and spatial patterning of primitive streak fates of human embryonic stem cells in a well-established micropattern system. They demonstrate that endogenous FGF signaling is required for ERK activation in a ring-domain in the micropatterns, and that this localized signaling is directly required for differentiation and spatial patterning of specific cell types. Through high-resolution microscopy and transwell assays, they show that cells receive FGF signals through basally localized receptors. Finally, the authors find that there is a requirement for exogenous FGF2 to initiate primitive streak-like differentiation, but endogenous FGFs, especially FGF4 and FGF17, fully take over at later stages.

      Even though some of the authors' findings - such as the localized expression of FGF ligands during gastrulation and the importance of FGF/ERK signaling for cell differentiation in the primitive streak - have been reported in model organisms before, this is one of the first studies to investigate the role of FGF signaling during primitive streak-like differentiation of human cells. In doing so, the paper reports a number of interesting and valuable observations, namely the basal localization of FGF receptors which mirrors that of BMP and Nodal receptors, as well as the existence of a positive feedback loop centered on FGF signaling that drives primitive-streak differentiation. The authors also perform a comparison of the role of different FGFs across species and try to assign specific functions to individual FGFs. In the absence of clean genetic loss-of-function cell lines, this part of the work remains less strong.

      We thank the reviewer for emphasizing the value of our findings in a human model for gastrulation. We agree more loss-of-function experiments would provide further insight into the role of different FGFs, and we plan to provide additional data along these lines in the revised manuscript.

    1. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      Walton et al. set out to isolate new phages targeting the opportunistic pathogen Pseudomonas aeruginosa. Using a double ∆fliF ∆pilA mutant strain, they were able to isolate 4 new phages, CLEW-1. -3, -6, and -10, which were unable to infect the parental PAO1F Wt strain. Further experiments showed that the 4 phages were only able to infect a ∆fliF strain, indicating a role of the MS-protein in the flagellum complex. Through further mutational analysis of the flagellum apparatus, the authors were able to identify the involvement of c-di-GMP in phage infection. Depletion of c-di-GMP levels by an inducible phosphodiesterase renders the bacteria resistant to phage infection, while elevation of c-di-GMP through the Wsp system made the cells sensitive to infection by CLEW-1. Using TnSeq, the authors were able to not only reaffirm the involvement of c-di-GMP in phage infection but also able to identify the exopolysaccharide PSL as a downstream target for CLEW-1. C-di-GMP is a known regulator of PSL biosynthesis. The authors show that CLEW-1 binds directly to PSL on the cell surface and that deletion of the pslC gene resulted in complete phage resistance. The authors also provide evidence that the phage-PSL interaction happens during the biofilm mode of growth and that the addition of the CLEW-1 phage specifically resulted in a significant loss of biofilm biomass. Lastly, the authors set out to test if CLEW-1 could be used to resolve a biofilm infection using a mouse keratitis model. Unfortunately, while the authors noted a reduction in bacterial load assessed by GFP fluorescence, the keratitis did not resolve under the tested parameters. 

      Strengths: 

      The experiments carried out in this manuscript are thoughtful and rational and sufficient explanation is provided for why the authors chose each specific set of experiments. The data presented strongly supports their conclusions and they give present compelling explanations for any deviation. The authors have not only developed a new technique for screening for phages targeting P. aeruginosa, but also highlight the importance of looking for phages during the biofilm mode of growth, as opposed to the more standard techniques involving planktonic cultures. 

      Weaknesses: 

      While the paper is strong, I do feel that further discussions could have gone into the decision to focus on CLEW-1 for the majority of the paper. The paper also doesn't provide any detailed information on the genetic composition of the phages. It is unclear if the phages isolated are temperate or virulent. Many temperate phages enter the lytic cycle in response to QS signalling, and while the data as it is doesn't suggest that is the case, perhaps the paper would be strengthened by further elimination of this possibility. At the very least it might be worth mentioning in the discussion section. 

      Thank you for your review. We will upload the genomes of all Clew phages and Ocp-2 before resubmission. It turns out that the Clew phage are highly related, which we wanted to express with the genomic comparison in the supplementary figure (rather unsuccessfully). It therefore made sense to focus our in-depth analysis on one of the phage. We will include a supplementary figure demonstrating that all Clew-1 phage require an intact psl locus for infection, to make that logic clearer. The phage are virulent (there is apparently a bit of a debate about this with regard to Bruynogheviruses, but we have not been able to isolate lysogens). This will be explained in the revised version of the manuscript as well.

      Reviewer #2 (Public review): 

      This manuscript by Walton et al. suggests that they have identified a new bacteriophage that uses the exopolysaccharide Psl from Pseudomonas aeruginosa (PA) as a receptor. As Psl is an important component in biofilms, the authors suggest that this phage (and others similarly isolated) may be able to specifically target biofilm-growing bacteria. While an interesting suggestion, the manner in which this paper is written makes it difficult to draw this conclusion. Also, some of the results do not directly follow from the data as presented and some relevant controls seem to be missing. 

      Thank you for your review. We would argue that the combination of demonstrating Psl-dependent binding of Clew-1 to P. aeruginosa, as well as demonstration of direct binding of Clew-1 to affinity-purified Psl, indicates that the phage binds directly to Psl and uses it as a receptor. In looking at the recommendations, it appears that the remark about controls refers to not using the ∆pslC mutant alone (as opposed to the ∆fliF2 ∆pslC double mutant) as a control for some of the binding experiments. However, since the ∆fliF2 mutant is more permissive for phage infection, analyzing the effect of deleting pslC in the context of the ∆fliF2 mutant background is the more stringent test.

    1. Author response:

      We sincerely thank all the reviewers for their enthusiasm and positive feedback, which has encouraged us to delve deeper into this research. As this is the first report of POLK in the brain using a longitudinal normative aging model, our primary aim was to establish the observational and phenomenological aspects. We agree with the reviewers that more detailed molecular, biochemical, and cellular studies are essential to elucidate underlying mechanisms. However, as noted by some reviewers, these investigations, while they will raise the impact, may fall outside the scope of the current report. Indeed, many of these lines of investigation are currently ongoing. Below, we provide our provisional responses to individual reviewer comments.

      Response to Reviewer #1:

      a) Concern over POLK antibody characterization in mice:

      We performed knocking down of POLK by siRNA in mice cortical primary neuronal culture (Fig S1C). In the revised version, we will provide a more detailed characterization of POLK antibodies in mouse cells.

      b) More mechanistic investigation is needed before POLK could be considered as a brain aging clock:

      We sincerely appreciate the valuable suggestion. In our ongoing work exploring the mechanisms of POLK in postmitotic neurons, preliminary findings using siPOLK indicate an upregulation of senescence markers along with a reduction in DNA repair synthesis (manuscript in preparation). We will reference this companion manuscript in the revised version and are pleased to share these data with the reviewers for their consideration.

      Response to Reviewer #2:<br /> a) Concern on more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol:

      We sincerely appreciate the reviewer’s enthusiasm and valuable guidance in helping us better understand the mechanism of nuclear-cytoplasmic POLK dynamics. Previously, we developed a modified aniPOND (accelerated native isolation of proteins on nascent DNA) protocol, which we termed iPoKD-MS (isolation of proteins on Pol kappa synthesized DNA  followed by mass spectrometry), to capture proteins bound to nascent DNA synthesized by POLK in human cell lines (bioRxiv https://www.biorxiv.org/content/10.1101/2022.10.27.513845v3). In this dataset, we identified potential candidates that may regulate nuclear/cytoplasmic POLK dynamics. These candidates are currently undergoing validation in human cell lines, and we are preparing a manuscript on these findings. Among these, some candidates, including previously identified proteins such as exportin and importin (Temprine et al., 2020, PMID: 32345725), are being explored further as potential POLK nuclear/cytoplasmic shuttles. We are also conducting tests on these candidates in mouse cortical primary neurons to assess their role in POLK dynamics. In the revised version of the manuscript, we will include a discussion of our current understanding and outline our planned studies.

      b) Question on “… what is POLK doing in the cytosol, and what is it interacting with …”:

      Our data so far indicate that POLK accumulates in stress granules and lysosomes. We are very grateful for the reviewer’s insightful suggestions and will make every effort to incorporate them in the revised manuscript. Currently, we are characterizing POLK accumulation in the cytoplasm using additional lysosomal markers, as recommended by the reviewer. If these experiments prove challenging in mouse brain tissues, we plan to investigate them in primary neuron cultures. We are hopeful to include these findings in the revised version. Additionally, we have optimized the POLK antibody for immunoprecipitation from nuclear and cytoplasmic fractions of mouse brain tissue. These findings, which are beyond the scope of the current study, will be reported in a separate manuscript.

      Response to Reviewer #3:

      We highly appreciate the reviewer bringing up the context of biomolecular condensates. Our iPoKD-MS data referenced above suggests candidates from various biomolecular condensates that we are currently investigating. We are currently investigating by subcellular fractionation the presence of POLK in different biomolecular condensates that will be fully reported in future publications. We appreciate the reviewer providing important literature that will be cited and potential biomolecular condensates will be discussed in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript from Mukherjee et al examines potential connections between telomere length and tumor immune responses. This examination is based on the premise that telomeres and tumor immunity have each been shown to play separate, but important, roles in cancer progression and prognosis as well as prior correlative findings between telomere length and immunity. In keeping with a potential connection between telomere length and tumor immunity, the authors find that long telomere length is associated with reduced expression of the cytokine receptor IL1R1. Long telomere length is also associated with reduced TRF2 occupancy at the putative IL1R1 promoter. These observations lead the authors towards a model in which reduced telomere occupancy of TRF2 - due to telomere shortening - promotes IL1R1 transcription via recruitment of the p300 histone acetyltransferase. This model is based on earlier studies from this group (i.e. Mukherjee et al., 2019) which first proposed that telomere length can influence gene expression by enabling TRF2 binding and gene transactivation at telomere-distal sites. Further mechanistic work suggests that G-quadruplexes are important for TRF2 binding to IL1R1 promoter and that TRF2 acetylation is necessary for p300 recruitment. Complementary studies in human triple-negative breast cancer cells add potential clinical relevance but do not possess a direct connection to the proposed model. Overall, the article presents several interesting observations, but disconnection across central elements of the model and the marginal degree of the data leave open significant uncertainty regarding the conclusions.

      Strengths:

      Many of the key results are examined across multiple cell models.

      The authors propose a highly innovative model to explain their results.

      Weaknesses:

      Although the authors attempt to replicate most key results across multiple models, the results are often marginal or appear to lack statistical significance. For example, the reduction in IL1R1 protein levels observed in HT1080 cells that possess long telomeres relative to HT1080 short telomere cells appears to be modest (Supplementary Figure 1I). Associated changes in IL1R1 mRNA levels are similarly modest.

      Related to the point above, a lack of strong functional studies leaves an open question as to whether observed changes in IL1R1 expression across telomere short/long cancer cells are biologically meaningful.

      Statistical significance is described sporadically throughout the paper. Most major trends hold, but the statistical significance of the results is often unclear. For example, Figure 1A uses a statistical test to show statistically significant increases in TRF2 occupancy at the IL1R1 promoter in short telomere HT1080 relative to long telomere HT1080. However, similar experiments (i.e. Figure 2B, Figure 4A - D) lack statistical tests.

      TRF2 overexpression resulted in ~ 5-fold or more change in IL1R1 expression. Compared to this, telomere length-dependent alterations in IL1R1 expression, although about 2-fold, appear modest (~ 50% reduction in cells with long telomeres across different model systems used). Notably, this was consistent and significant across cell-based model systems and xenograft tumors (see Figure 1). Unlike TRF2 induction, telomere elongation or shortening vary within the permissible physiological limits of cells. This is likely to result in the observed variation in IL1R1 levels.

      For biological relevance, we have shown this using multiple models where telomere length was either different (patient tissue, organoids) or were altered (cell lines, xenograft models) . Where IL1 signalling in TNBC tissue and tumor organoids, and cells/xenografts were shown to impact M2 macrophage infiltration in a telomere length sensitive fashion. We made use of the tumor organoids to test M2 macrophage infiltration using IL1RA and small molecule based IL1R1 inhibition.

      We have now included statistical tests in all the relevant figures and incorporated the necessary details about the tests performed in the figure legend for clarity of readers. Additionally, all data points, p values and details of statistical tests have been included in Figure wise excel sheets for both main and supplementary figures.

      Reviewer #1 (Recommendations For The Authors):

      There are typos throughout the manuscript. The word 'expression' is incorrectly spelled on y-axis labels throughout the manuscript (for example see Figure 1B). The word 'telomere' is incorrectly spelled in Supplementary Figure 1 legend panel A. Most errors, such as these, do not interfere with my comprehension of the manuscript. However, others made the manuscript difficult to follow. For example, I think that MDAMB231, MDAMD231, and MDAM231 are frequently used interchangeably to refer to the same cell line. This makes it very difficult to understand certain experiments.

      I often found it difficult to understand which statistical test was used for a specific experiment. I suggest changing the style in the legends to more clearly connect statistical tests with specific data points.

      We thank the reviewer for pointing out the typological errors. We have now made relevant corrections to both figures and text.

      As stated above, we have now provided details of statistical tests performed in the figure legend for clarity of readers. Additionally, all data points, p values and details of statistical tests have been included in Figure wise excel sheets for both main and supplementary figures.

      Reviewer #2 (Public Review):

      This study highlights the role of telomeres in modulating IL-1 signaling and tumor immunity. The authors demonstrate a strong correlation between telomere length and IL-1 signaling by analyzing TNBC patient samples and tumor-derived organoids. Mechanistic insights revealed non-telomeric TRF2 binding at the IL-1R1. The observed effects on NF-kB signaling and subsequent alterations in cytokine expression contribute significantly to our understanding of the complex interplay between telomeres and the tumor microenvironment. Furthermore, the study reports that the length of telomeres and IL-1R1 expression is associated with TAM enrichment. However, the manuscript lacks in-depth mechanistic insights into how telomere length affects IL-1R1 expression. Overall, this work broadens our understanding of telomere biology.

      The mechanism of how telomere length affects IL1R1 expression involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, the IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). We have described this in the manuscript along with references citing the previous works. A scheme explaining the model was provided as Additional Supplementary Figure 1, along with a description of the mechanistic model.

      Figure 1-4 in main figures describe the molecular mechanism of telomere-dependent IL1R1 activation. This includes ChIP data for TRF2 on the IL1R1 promoter in long/short telomeres, as well as TRF2-mediated histone/p300 recruitment and IL1R1 gene expression. We further show how specific acetylation on TRF2 is crucial for TRF2-mediated IL1R1 regulation (Figure 5).

      Reviewer #2 (Recommendations For The Authors):

      The study primarily provides a snapshot of cytokine expression and telomere length at a single time point. Longitudinal studies or dynamic analyses could provide a more comprehensive understanding of the temporal relationship between telomere length and cytokine expression.

      Tumor heterogeneity is a significant problem for the various therapies. The study notes significant heterogeneity in telomere length but does not investigate the implications of this heterogeneity. Understanding the role of telomere length variation in different tumor cell populations is essential for a comprehensive interpretation of the results.

      The study only mentions a correlation between IL1R1 and relative telomere length but does not provide any potential clinical correlations with patient outcomes or survival. Addressing the clinical relevance of these molecular changes would improve the translational impact.

      The importance of IL1R1 in prognostic and clinical outcomes of TNBC has been studied by multiple groups. The overall consensus is that higher IL1R1 leads to poor prognosis – aiding both cancer progression and metastasis. Using publicly available TCGA data, we found that IL1R1 high samples had significantly lower survival in breast cancer (BRCA) datasets. The results have now been included in the manuscript as Supplemnetray Figure 7G.

      Addition in text:

      “We, next, used publicly available TCGA gene expression data of breast cancer samples (BRCA) (Supplementary file 4) to assess the effect of IL1R1 expression on cancer prognosis. We categorized samples based on IL1R1 expression: IL1R1 high (N=254) and IL1R1 low samples (N= 709). It was seen that overall patient survival was significantly lower in IL1R1 high samples (Log-rank p value -0.0149) (Supplementary Figure 7G). We also checked the frequency of occurrence of various breast cancer sub-types in IL1R1 high and low samples (Supplementary Figure 7H). While invasive mixed mucinous carcinoma (the most abundant sub-type) was predominantly seen in IL1R1 low samples, metaplastic breast cancer was only found within the IL1R1 high samples. Interestingly, metaplastic breast cancer has been frequently found to be ‘triple negative’-i.e., ER-,PR- and HER2-. (Reddy et al., 2020).”

      However, we could not access a TNBC (or any breast cancer dataset) that has been characterized for telomere length. Unfortunately, the clinical TNBC samples that we had access to did not have any paired short-term/long-term survival datasets. We could, in principle, use TERT/TERC expression as a proxy for telomere length; however, in our experiments, we found that telomerase activity did not positively correlate with telomere length as expected (Supplementary Figure 7C, Supplementary Figure 8D). Therefore, transcriptional signature (of telomere-associated genes) may not be a reliable indicator of telomere length.

      The study lacks in-depth mechanistic insights into how telomere length affects IL1R1 expression and subsequently influences TAM infiltration. Further molecular studies or pathway analyses are necessary to elucidate the underlying mechanisms.

      The mechanism involves sequestration and reallocation of TRF2 between telomeres and gene promoters (in this case, IL1R1 promoter). We have previously shown this across multiple genomic sites (Mukherjee et al, 2018). We have appropriately discussed this in the manuscript.

      A schematic explaining the model has been provided as Additional Supplementary Figure 1.

      We have provided ChIP data for TRF2 on IL1R1 promoter in long/short telomeres in the manuscript as well as histone/p300 ChIP and gene expression (Figure 1-4 in main figures exclusively deal with molecular mechanism of telomere dependent IL1R1 activation).  We further go on to show how specific acetylation on TRF2 might be crucial for TRF2-mediated IL1R1 regulation (Figure 5). One of the key findings herein is the fact that TRF2 can directly regulate IL1R1 expression through promoter occupancy- tested in telomere altered cell lines (HT1080, MDAMB231) and tumor xenografts (Figure 1 A, F, I- for TRF2 promoter occupancy).

      Pathway analysis of HT1080 (short vs long telomere) transcriptome, shows that cytokine-cytokine receptor interaction is one of the key pathways in upregulated genes.

      While we have focused on TRF2 mediated IL1R1 regulation, it is quite possible that there are other telomere sensitive pathways/mechanisms by which IL1R1 is regulated. This has been duly acknowledged in the discussion.

      The manuscript title suggests modulation of immune signaling in the tumor microenvironment, yet the authors exclusively focus on CD206+ TAMs, limiting the scope. It is recommended to investigate other immune cell types for a more comprehensive understanding of changes in the immune tumor microenvironment.

      As stated above, we approached the manuscript from the purview of TRF2-mediated IL1R1 regulation. In our assessment of TCGA data for breast cancer, we found that CD206 (MRC1) had the highest enrichment in IL1R1 high samples among key TAM and TIL markers- now added as Figure 8A (Details in Supplementary file 5). It also had the highest correlation with IL1R1 among the tested markers. Therefore, we proceeded to check CD206+ve TAMs.

      Now the following section has been added to text:

      “We further found that the total proportion of immune cells (% of CD45 +ve cells) did not vary significantly between short and long telomere TNBC samples (Supplementary Figure 8C). However, TNBC-ST samples had a higher percentage of myeloid cells (CD11B +ve) within the CD 45 +ve immune cell population. We checked in three TNBC-ST and TNBC-LT samples each and found that the percentage of M1 macrophages (CD86 high CD 206 low) in the myeloid population was lower than that of the M2 macrophages (CD 206 high CD 86 low) and unlike the latter, did not vary significantly between the TNBC-ST and TNBC-LT samples (Supplementary Figure 8C).”

      Unfortunately, due to sample limitations we are unable to test this on a larger cohort of samples.

      A single cell transcriptome experiment may have been a good way to have a more comprehensive immune profiling. However, with our TNBC samples, isolated nuclei for downstream processing had low viability as per 10X genomics specifications.

      Does IL1R1 influence TAM recruitment or polarization within the tumor microenvironment? To assess the impact, the authors should use a marker indicative of M1-like macrophages, such as CD80 or CD86.

      To address the issue of TAM recruitment vs polarization meaningfully we need to characterize tissue resident macrophages as well as macrophages in circulation. We did not have access to patient blood.  A murine breast cancer in-vivo model might be a more appropriate model to test this, which would take considerable time for us to develop. It is something that we hope to address in a follow up study.

      Did the authors analyze other breast cancer subtypes for telomere length?

      Unfortunately, other breast cancer sub-types besides TNBC were not available to us for experimentation.

      Figure legends are very briefly written and need to be elaborated. Scale bars are also missing in images.

      Add a gating strategy for flow cytometry results in Figure 8A.

      Figure legend have been expanded for clarity. More prominent scale bars have been added for better visibility and reference.  A relevant gating strategy has been added as Supplementary figure 8B.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, entitled "Telomere length sensitive regulation of Interleukin Receptor 1 type 1 (IL1R1) by the shelterin protein TRF2 modulates immune signalling in the tumour microenvironment", Dr. Mukherjee and colleagues pointed out clarifying the extra-telomeric role of TRF2 in regulating IL1R1 expression with consequent impact on TAMs tumor-infiltration.

      Strengths:

      Upon careful manuscript evaluation, I feel that the presented story is undoubtedly well conceived. At the technical level, experiments have been properly performed and the obtained results support the authors' conclusions.

      Weaknesses:

      Unfortunately, the covered topic is not particularly novel. In detail, the TRF2 capability of binding extratelomeric foci in cells with short telomeres has been well demonstrated in a previous work published by the same research group. The capability of TRF2 to regulate gene expression is well-known, the capability of TRF2 to interact with p300 has been already demonstrated and, finally, the capability of TRF2 to regulate TAMs infiltration (that is the effective novelty of the manuscript) appears as an obvious consequence of IL1R1 modulation (this is probably due to the current manuscript organization).

      Here we studied the TRF2-IL1R1 regulatory axis (not reported earlier by us or others) as a case of the telomere sequestration model that we described earlier (Mukherjee et al., 2018; reviewed in J. Biol. Chem. 2020, Trends in Genetics 2023). This manuscript demonstrates the effect of the TRF2-IL1R1 regulation on telomere-sensitive tumor macrophage recruitment. To the best of our knowledge, no previous study connects telomeres of tumor cells mechanistically to the tumor immune microenvironment. Here we focused on the IL1R1 promoter and provided mechanistic evidence for acetylated-TRF2 engaging the HAT p300 for epigenetically altering the promoter. This mechanism of TRF2 mediated activation has not been previously reported. Further, the function of a specific post translational modification (acetylation of the lysine residue 293K) of TRF2 in IL1R1 regulation is described for the first time. Additional experiments showed that TRF2-acetylation mutants, when targeted to the IL1R1 promoter, significantly alter the transcriptional state of the IL1R1 promoter. To our knowledge, the function of any TRF2 residue in transcriptional activation had not been previously described. Taken together, these demonstrate novel insights into the mechanism of TRF2-mediated gene regulation, that is telomere-sensitive, and affects the tumor-immune microenvironment.

      We considered the reviewer’s suggestion to reorganize the result section. Reorganizing the manuscript to describe the TAM-related results first would, in our opinion, limit focus of the new findings and discovery [and novelty of the mechanisms (as described in above response, and in response to other comments by reviewers)] of the non-telomeric TRF2-mediated IL1R1 regulation. We have tried to bring out the novelty, implications and importance of the TAM-related observations in the discussion.

      Reviewer #3 (Recommendations For The Authors):

      Based on the comments reported above, I would encourage the author to modify the manuscript by reorganizing the text. I would suggest starting from the capability of TRF2 to modulate macrophages infiltration. Data relative to IL1R1 expression may be used to explain the mechanism through which TRF2 exerts its immune-modulatory role. This, in my view, would dramatically strengthen the presented story.

      Concerning the text, "results" should be dramatically streamlined and background information should be just limited to the "introduction" section.

      The manuscript should be carefully revisited at grammar level. A number of incomplete sentences and some typos are present within the text.

      We thank the reviewer for the appreciation of our work for its technical strengths.

      At the onset, we agree that we have explored the TRF2-IL1R1 regulatory axis. This underscores the significance of the telomere sequestration model that we had proposed earlier (Mukherjee et al., 2018). Herein, however, we significantly extend our previous work (which was more general and intended for putting forward the idea of telomere-dependent distal gene expression) by studying TRF2-mediated regulation of IL1 signalling (which was previously unreported). In addition, mechanistic details of how telomeres are connected to IL1 signaling through non-telomeric TRF2 are entirely new, not reported before by us or others.

      We have removed some text descriptions from the result section to streamline the section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank all the reviewers for their positive evaluation of our paper, as described in the Strengths section. We are also grateful for their helpful comments and suggestions, which we have addressed below. We believe that the manuscript has been significantly improved as a result of these suggestions. In addition to these changes, we also corrected some inconsistencies (statistical values in the last sentence of a Figure 5 caption) and sentences in the main text (lines 155, 452, 522) (these corrections did not affect the results).

      Fig. 5e: R=0.599, P<0.001 -> R=0.601, P=0.007

      L150: "the angle of stick tilt angle" -> "the angle of stick tilt"

      L437: "no such" -> "such"

      L522: "?" -> "."

      Reviewer #1 (Public Review):

      Summary/Strengths:

      This manuscript describes a stimulating contribution to the field of human motor control. The complexity of control and learning is studied with a new task offering a myriad of possible coordination patterns. Findings are original and exemplify how baseline relationships determine learning.

      Weaknesses:

      A new task is presented: it is a thoughtful one, but because it is a new one, the manuscript section is filled with relatively new terms and acronyms that are not necessarily easy to rapidly understand.

      First, some more thoughts may be devoted to the take-home message. In the title, I am not sure manipulating a stick with both hands is a key piece of information. Also, the authors appear to insist on the term ‘implicit’, and I wonder if it is a big deal in this manuscript and if all the necessary evidence appears in this study that control and adaptation are exclusively implicit. As there is no clear comparison between gradual and abrupt sessions, the authors may consider removing at least from the title and abstract the words ‘implicit’ and ‘implicitly’. Most importantly, the authors may consider modifying the last sentence of the abstract to clearly provide the most substantial theoretical advance from this study.

      Thank you for your positive comment on our paper. We agree with the reviewer that our paper used a lot of acronyms that might confuse the readers. As we have addressed below (in the rebuttal to the Results section), we have reduced the number of acronyms.

      Regarding the comment on the use of the word “implicit” in the title and the abstract, we believe that its use in this paper is very important and indispensable. One of our main findings was that the pattern of adaptation between the tip-movement direction and the stick-tilt angle largely followed that in the baseline condition when aiming at different target directions. This adaptation was largely implicit because participants were not aware of the presence of the perturbation as the amount of perturbation was gradually increased. This implicitness suggests that the adaptation pattern of how the movement should be corrected is embedded in the motor learning system. On the other hand, if this adaptation pattern was achieved on the basis of the explicit strategy of changing the direction of the tip-movement, the adaptation pattern that follows the baseline pattern is not at all surprising. For these reasons, we will continue to use the word "implicit".

      It seems that a substantial finding is the ‘constraint’ imposed by baseline control laws on sensorimotor adaptation. This seems to echo and extend previous work of Wu, Smith et al. (Nat Neurosci, 2014): their findings, which were not necessarily always replicated, suggested that the more participants were variable in baseline, the better they adapted to a systematic perturbation. The authors may study whether residual errors are smaller or adaptation is faster for individuals with larger motor variability in baseline. Unfortunately, the authors do not present the classic time course of sensorimotor adaptation in any experiment. The adaptation is not described as typically done: the authors should thus show the changes in tip movement direction and stick-tilt angle across trials, and highlight any significant difference between baseline, early adaptation, and late adaptation, for instance. I also wonder why the authors did not include a few noperturbation trials after the exposure phase to study after-effects in the study design: it looks like a missed opportunity here. Overall, I think that showing the time course of adaptation is necessary for the present study to provide a more comprehensive understanding of that new task, and to re-explore the role of motor variability during baseline for sensorimotor adaptation.

      We appreciate the reviewer for raising these important issues.

      Regarding the learning curve, because the amount of perturbation was gradually increased except for Exp.1B, we were not able to obtain typical learning curves (i.e., the curve showing errors decaying exponentially with trials). However, it may still be useful to show how the movement changed with trials during adaptation. Therefore, following the reviewer's suggestion, we have added the figures of the time course of adaptation in the supplementary data (Figures S1, S2, S4, and S5).

      There are two reasons why our experiments did not include aftereffect quantification trials (i.e., probe trials). First, in the case of adaptation to a visual perturbation (e.g., visual rotation), probe trials are not necessary because the degree of adaptation can be easily quantified by the amount of compensation in the perturbation trials (however, in the case of dynamic perturbations such as force fields, the use of probe trials is necessary). Second, the inclusion of probe trials allows participants to be aware of the presence of the perturbation, which we would like to avoid.

      We also appreciate the interesting additional questions regarding the relevance of our work to the relationship between baseline motor variability and adaptation performance. As this topic, although interesting, is outside the scope of this paper, we concluded that we would not address it in the manuscript. In fact, the experiments were not ideal for quantifying motor variability in the baseline phase because participants had to aim at different targets, which could change the characteristics of motor variability. In addition, we gradually increased the size of the perturbation except for Exp.1B (see Author response image 1, upper panel), which could make it difficult to assess the speed of adaptation. Nevertheless, we think it is worth mentioning this point in this rebuttal. Specifically, we examined the correlation between baseline motor variability when aiming the 0 deg target (tip-movement direction or stick-tilt angle) and adaptation speed in Exp 1A and Exp 1B (Author response image 1 and Author response image 2). To assess adaptation speed in Exp.1A, we quantified the slope of the tip-movement direction to a gradually increasing perturbation (Author response image 1, upper panel). The adaptation speed in Exp.1B was obtained by fitting the exponential function to the data (Author response image 2, upper panel). Although the statistical results were not completely consistent, we found that the participants with greater the motor variability at baseline tended to show faster adaptation, as shown in a previous study (Wu et al., Nat Neurosci, 2014).

      Author response image 1.

      Correlation between the baseline variability and learning speed (Experiment 1A). In Exp 1A, the rotation of the tip-movement direction was gradually increased by 1 degree per trial up to 30 degrees. The learning speed was quantified by calculating how quickly the direction of movement followed the perturbation (upper panel). The lower left panel shows the variability of the tip-movement direction versus learning speed, while the lower right panel shows the variability of the stick-tilt angle versus learning speed. Baseline variability was calculated as a standard deviation across trials (trials in which a target appeared in a 0-degree direction).

      Author response image 2.

      Correlation between the baseline variability and learning speed (Experiment 1B). In Exp 1B, the rotation of the tip-movement direction was abruptly applied from the first trial (30 degrees). The learning speed was calculated as a time constant obtained by exponential curve fitting. The lower left panel shows the variability of the tip-movement direction versus learning speed, while the lower right panel shows the variability of the stick-tilt angle versus learning speed. Baseline variability was calculated as a standard deviation across trials (trials in which a target appeared in a 0-degree direction).

      The distance between hands was fixed at 15 cm with the Kinarm instead of a mechanical constraint. I wonder how much this distance varied and more importantly whether from that analysis or a force analysis, the authors could determine whether one hand led the other one in the adaptation.

      Thank you very much for this important comment. Since the distance between the two hands was maintained by the stiff virtual spring (2000 N/m), it was kept almost constant throughout the experiments as shown in Author response image 3 (the averaged distance during a movement). The distance was also maintained during reaching movements (Author response image 4).

      We also thank the reviewer for the suggestion regarding the force analysis. As shown in Author response image 5, we did not find a role for a specific hand for motor adaptation from the handle force data. Specifically, Author response image 5 shows the force applied to each handle along and orthogonal to the stick. If one hand led the other in adaptation, we should have observed a phase shift as adaptation progressed. However, no such hand specific phase shift was observed. It should be noted, however, that it was theoretically difficult to know from the force sensors which hand produced the force first, because the force exerted by the right handle was transmitted to the left handle and vice versa due to the connection by the stiff spring. 

      Author response image 3.

      The distance between hands during the task. We show the average distance between hands for each trial. The shaded area indicates the standard deviation across participants.

      Author response image 4.

      Time course changes in the distance between hands during the movement. The color means the trial epoch shown in the right legend.

      Author response image 5.

      The force profile during the movement (Exp 1A). We decomposed the force of each handle into the component along (upper panels) and orthogonal to the stick (lower panels). Changes in the force profiles in the adaptation phase are shown (left: left hand force, right: right hand force). The colors (magenta to cyan) mean trial epoch shown in the right legend.

      I understand the distinction between task- and end-effector irrelevant perturbation, and at the same time results show that the nervous system reacts to both types of perturbation, indicating that they both seem relevant or important. In line 32, the errors mentioned at the end of the sentence suggest that adaptation is in fact maladaptive. I think the authors may extend the Discussion on why adaptation was found in the experiments with end-effector irrelevant and especially how an internal (forward) model or a pair of internal (forward) models may be used to predict both the visual and the somatosensory consequences of the motor commands.

      Thank you very much for your comment. As we already described in the discussion of the original manuscript (Lines 519-538 in the revised manuscript), two potential explanations exist for the motor system’s response to the end-effector irrelevant perturbation (i.e., stick rotation). First, the motor system predicts the sensory information associated with the action and attempts to correct any discrepancies between the prediction and the actual sensory consequences, regardless of whether the error information is end-effector relevant or end-effector irrelevant. Second, given the close coupling between the tip-movement direction and stick-tilt angle, the motor system can estimate the presence of end-effector relevant error (i.e., tip-movement direction) by the presence of end-effector irrelevant error (i.e., stick-tilt angle). This estimation should lead to the change in the tip-movement direction. As the reviewer pointed out, the mismatch between visual and proprioceptive information is another possibility, we have added the description of this point in Discussion (Lines 523-526).

      Reviewer #1 (Recommendations For The Authors):

      Minor

      Line 16: “it remains poorly understood” is quite subjective and I would suggest reformulating this statement.

      We have reformulated this statement as “This limitation prevents the study of how….”  (Line 16).

      Introduction

      Line 49: the authors may be more specific than just saying ‘this task’. In particular, they need to clarify that there is no redundancy in studies where the shoulder is fixed and all movement is limited to a plane ... which turns out to truly happen in a limited set of experimental setups (for example: Kinarm exoskeleton, but not endpoint; Kinereach system...).

      We have changed this to “such a planar arm-reaching task” (Line 49).

      Line 61: large, not infinite because of biomechanical constraints.

      We have changed “an infinite” to “a large” (Line 61) and “infinite” to “a large number of” (legend in Fig. 1f).

      Lines 67-69: consider clarifying.

      We have tried to clarify the sentence (Lines 67-69).

      Results

      TMD and STA, and TMD-STA plane, are new terms with new acronyms that are not easy to immediately understand. Consider avoiding acronyms.

      We have reduced the use of these acronyms as much as possible. 

      “visual TMD–STA plane” -> “plane representing visual movement patterns” (Lines 179180)

      “TMD axis” -> “x-axis” (Line 181, Line 190)

      “physical TMD–STA plane” -> “plane representing physical movement patterns” (Lines 182-187)

      “physical TMD–STA plane” -> “physical plane” (Line 191, Line 201, Lines 216-217, Line 254, Line 301, Line 315, Line 422, Line 511, and captions of Figures 4-9, S3)

      “visual TMD–STA plane” -> “visual plane” (Line 193, Line 241, Line 248, Line 300, Lines

      313-314, and captions of Figures 4-9, S3)

      “STA axis” -> “y-axis” (Line 241)

      Line 169: please clarify the mismatch(es) that are created when the tip-movement direction is visually rotated in the CCW direction around the starting position (tip perturbation), whereas the stick-tilt angle remains unchanged.

      Thank you for your pointing this out. We have clarified that the stick-tilt angle remains identical to the tilt of both hands (Lines 171-172).

      Discussion

      I understand the physical constraint imposed between the 2 hands with the robotic device, but I am not sure I understand the physical constraint imposed by the TMD-STA relationship.

      The phrase “physical constraint” meant the constraint of the movement on the physical space. However, as the reviewer pointed out, this phrase could confuse the constraint between the two hands. Therefore, we have avoided using the phrase “physical constraint” throughout the manuscript.

      Some work looking at 3-D movements should be used for Discussion (e.g. Lacquaniti & Soechting 1982; work by d’Avella A or Jarrasse N).

      Thank you for sharing this important information. We have cited these studies in Discussion (Lines 380-382). 

      Reviewer #2 (Public Review):

      Summary:

      The authors have developed a novel bimanual task that allows them to study how the sensorimotor control system deals with redundancy within our body. Specifically, the two hands control two robot handles that control the position and orientation of a virtual stick, where the end of the stick is moved into a target. This task has infinite solutions to any movement, where the two hands influence both tip-movement direction and stick-tilt angle. When moving to different targets in the baseline phase, participants change the tilt angle of the stick in a specific pattern that produces close to the minimum movement of the two hands to produce the task. In a series of experiments, the authors then apply perturbations to the stick angle and stick movement direction to examine how either tipmovement (task-relevant) or stick-angle (task-irrelevant) perturbations affect adaptation. Both types of perturbations affect adaptation, but this adaptation follows the baseline pattern of tip-movement and stick angle relation such that even task-irrelevant perturbations drive adaptation in a manner that results in task-relevant errors. Overall, the authors suggest that these baseline relations affect how we adapt to changes in our tasks. This work provides an important demonstration that underlying solutions/relations can affect the manner in which we adapt. I think one major contribution of this work will also be the task itself, which provides a very fruitful and important framework for studying more complex motor control tasks.

      Strengths:

      Overall, I find this a very interesting and well-written paper. Beyond providing a new motor task that could be influential in the field, I think it also contributes to studying a very important question - how we can solve redundancy in the sensorimotor control system, as there are many possible mechanisms or methods that could be used - each of which produces different solutions and might affect the manner in which we adapt.

      Weaknesses:

      I would like to see further discussion of what the particular chosen solution implies in terms of optimality.

      The underlying baseline strategy used by the participants appears to match the path of minimum movement of the two hands. This suggests that participants are simultaneously optimizing accuracy and minimizing some metabolic cost or effort to solve the redundancy problem. However, once the perturbations are applied, participants still use this strategy for driving adaptation. I assume that this means that the solution that participants end up with after adaptation actually produces larger movements of the two hands than required. That is - they no longer fall onto the minimum hand movement strategy - which was used to solve the problem. Can the authors demonstrate that this is either the case or not clearly? These two possibilities produce very different implications in terms of the results.

      If my interpretation is correct, such a result (using a previously found solution that no longer is optimal) reminds me of the work of Selinger et al., 2015 (Current Biology), where participants continue to walk at a non-optimal speed after perturbations unless they get trained on multiple conditions to learn the new landscape of solutions. Perhaps the authors could discuss their work within this kind of interpretation. Do the authors predict that this relation would change with extensive practice either within the current conditions or with further exploration of the new task landscape? For example, if more than one target was used in the adaptation phase of the experiment?

      On the other hand, if the adaptation follows the solution of minimum hand movement and therefore potentially effort, this provides a completely different interpretation.

      Overall, I would find the results even more compelling if the same perturbations applied to movements to all of the targets and produced similar adaptation profiles. The question is to what degree the results derive from only providing a small subset of the environment to explore.

      Thank you very much for pointing out this significant issue. As the reviewer correctly interprets, the physical movement patterns deviated from the baseline relationship as exemplified in Exp.2. However, this deviation is not surprising for the following reason. Under the perturbation that creates the dissociation between the hands and the stick, the motor system cannot simultaneously return both the visual stick motion and physical hands motion to the original motions: When the motor system tries to return the visual stick motion to the original visual motion, then the physical hands motion inevitably deviates from the original physical hands motion, and vice versa.  

      Our interpretation of this result is that the motor system corrects the movement to reduce the visual dissociation of the visual stick motion from the baseline motion (i.e., sensory prediction error), but this movement correction is biased by the baseline physical hands motion. In other words, the motor system attempts to balance the minimization of sensory prediction error and the minimization of motor cost. Thus, our results do not indicate that the final adaptation pattern is non-optimal, but rather reflect the attempts for optimization.

      In the revised manuscript, we have added the description of this interpretation (Lines 515-517).

      Reviewer #2 (Recommendations For The Authors):

      The authors have suggested that the only study (line 472) that has also examined an end-effector irrelevant perturbation is the bimanual study of Omrani et al., 2013, which only examined reflex activity rather than adaptation. To clarify this issue - exactly what is considered end-effector irrelevant perturbations - I was wondering about the bimanual perturbations in Dimitriou et al., 2012 (J Neurophysiol) and the simultaneous equal perturbations in Franklin et al., 2016 (J Neurosci), as well as other recent papers studying task-irrelevant disturbances which aren’t discussed. I would consider these both to also be end-effector irrelevant perturbations, although again they only used these to study reflex activity and not adaptation as in the current paper. Regardless, further explanation of exactly what is the difference between task-irrelevant and end-effector irrelevant would be useful to clarify the exact difference between the current manuscript and previous work.

      Thank you for your helpful comments. We have included as references the study by Dimitriou et al. (Line 490) and Franklin et al. (Lines 486-487), which use an endeffector irrelevant perturbation and the task-irrelevant perturbation condition, respectively. We have also added further explanation of what is the difference between task-irrelevant and end-effector irrelevant (Lines 344-352). 

      Line 575: I assume that you mean peak movement speed

      We have added “peak”. (Line 597).

      Reviewer #3 (Public Review):

      Summary:

      This study explored how the motor system adapts to new environments by modifying redundant body movements. Using a novel bimanual stick manipulation task, participants manipulated a virtual stick to reach targets, focusing on how tip-movement direction perturbations affected both tip movement and stick-tilt adaptation. The findings indicated a consistent strategy among participants who flexibly adjusted the tilt angle of the stick in response to errors. The adaptation patterns are influenced by physical space relationships, guiding the motor system’s choice of movement patterns. Overall, this study highlights the adaptability of the motor system through changes in redundant body movement patterns.

      Strengths:

      This paper introduces a novel bimanual stick manipulation task to investigate how the motor system adapts to novel environments by altering the movement patterns of our redundant body.

      Weaknesses:

      The generalizability of the findings is quite limited. It would have been interesting to see if the same relationships were held for different stick lengths (i.e., the hands positioned at different start locations along the virtual stick) or when reaching targets to the left and right of a start position, not just at varying angles along one side. Alternatively, this study would have benefited from a more thorough investigation of the existing literature on redundant systems instead of primarily focusing on the lack of redundancy in endpointreaching tasks. Although the novel task expands the use of endpoint robots in motor control studies, the utility of this task for exploring motor control and learning may be limited.

      Thank you very much for the important comment. Given that there are many parameters (e.g., stick length, locations of hands, target position etc), one may wonder how the findings obtained from only one combination can be generalized to other configurations. In the revised manuscript, we have explicitly described this point (Lines 356-359). 

      Thus, the generalizability needs to be investigated in future studies, but we believe that the main results also apply to other configurations. Regarding the baseline stick movement pattern, the control with tilting the stick was observed regardless of the stick-tip positions (Author response image 6). Regarding the finding that the adapted stick movement patterns follow the baseline movement patterns, we confirmed the same results even when the other targets were used as the target for the adaptation (Author response image 7). 

      Author response image 6.

      Stick-tip manipulation patterns when the length of the stick varied. Top: 10 naïve participants moved the stick with different lengths. A target appeared on one of five directions represented by a color of each tip position. Regardless of the length of the stick and laterality, a similar relationship between tip-movement direction and stick-tilt angle was observed. (middle: at peak velocity, bottom: at movement offset).

      Author response image 7.

      Patterns of adaptation when using the other targets. In the baseline phase, 40 naïve participants moved a stick tip to a peripheral target (24 directions). They showed a stereotypical relationship between the tip-movement direction and the stick-tilt angle (a bold gray curve). In the adaptation phase, participants were divided into four groups, each with a different target training direction (lower left, lower right, upper right, or upper left), and visual rotation was gradually imposed on the tip-movement direction. Irrespective of the target direction, the adaptation pattern of the tipmovement and stick-tilt followed with the baseline relationship.

      We also thank you for your comment about studying the existing redundant systems. We can understand the reviewer's concern about the usefulness of our task, but we believe that we have proposed the novel framework for motor adaptation in the redundant system. The future studies will be able to clarify how the knowledge gained from our task can be generally applied to understand the control and learning of the redundant system.

      Reviewer #3 (Recommendations For The Authors):

      Line 49: replace “uniquely” with primarily. A number of features of the task setup could affect the joint angles, from if/how the arm is supported, whether the wrist is fixed, alignment of the target in relation to the midline of the participant, duration of the task, and whether fatigue is an issue, etc. Your statement relates to fixed limb lengths of a participant, rather than standard reaching tasks as a whole. Not to mention the degree of inter- and intra-subject variability that does exist in point-to-point reaching tasks.

      Thank you for your helpful point. We have replaced “uniquely” with “primarily”. (Line 49).

      Line 72: the cursor is not an end-effector - it represents the end-effector.

      We have changed the expression as “the perturbation to the cursor representing the position of the end-effector (Line 72).

      Lines 73 – 78: it would benefit the authors to consider the role of intersegmental dynamics.

      Thank you for your suggestion. We are not sure if we understand this suggestion correctly, but we interpret that this suggestion to mean that the end-effector perturbation can be implemented by using the perturbation that considers the intersegmental dynamics. However, the implementation is not so straightforward, and the panels in Figure 1j,k are only conceptual for the end-effector irrelevant perturbation. Therefore, we have not described the contribution of intersegmental dynamics here.

      Lines 90 – 92: “cannot” should be “did not”, as the studies being referenced are already completed. This statement should be further unpacked to explain what they did do, and how that does not meet the requirement of redundancy in movement patterns.

      We have changed “cannot” to “did not” (Line 91). We have also added the description of what the previous studies had demonstrated (Line 88-90).

      Figure text could be enlarged for easier viewing.

      We have enlarged texts in all figures. 

      Lines 41 - 47: Interesting selection of supporting references. For the introduction of a novel environment, I would recommend adding the support of Shadmehr and MussaIvaldi 1994.

      Thank you for your suggestion. We have added Shadmehr and Mussa-Ivaldi 1994 as a reference (Line 45).

      Line 49: “this task” is vague - the above references relate to a number of different tasks. For example, the authors could replace it with a reaching task involving an end-point robot.

      Thank you very much for your suggestion. As per the suggestion by Reviewer #1, we have changed this to “such a planar arm-reaching task” (Line 49).

      Line 60: “hypothetical limb with three joints” - in Figure 1a, the human subject, holding the handle of a robotic manipulandum does have flexibility around the wrist.

      Previous studies using planar arm-reaching task have constrained the wrist joint (e.g., Flash & Hogan, 1985; Gordon et al., 1994; Nozaki et al., 2006). We tried to emphasize this point as “participants manipulate a visual cursor with their hands primarily by moving their shoulder and elbow joints” (Line 42). In the revised manuscript, we have also emphasized this point in the legend of Figure 1a.

      Lines 93-108: this paragraph could be cleaned up more clearly stating that while the use of task-irrelevant perturbations has been used in the domain of reaching tasks, the focus of these tasks has not been specifically to address “In our task, we aim to exploit this feature by doing”

      Thank you very much for your helpful comments. To make this paragraph clear, we have modified some sentences (Line 100-104).

      Line 109: “coordinates to adapt” is redundant.

      We have changed this to “adapts” (Line 110).

      Lines 109-112: these sentences could be combined to have better flow.

      Thank you very much for your valuable suggestion. We have combined these two sentences for the better flow (Line 110-112).

      Line 113-114: consider rewording - “This is a redundant task because ...” to something like “Redundancy in the task is achieved by acknowledging that ....“.

      We have changed the expression according to the reviewer’s suggestion (Line 114).

      Line 118: Consider changing “changes” to “makes use of”.

      We have changed the expression (Line 119).

      Lines 346 - 348: grammar and clarity - “This redundant motor task enables the investigation of adaptation patterns in the redundant system following the introduction of perturbations that are either end-effector relevant, end-effector irrelevant, or both.“.

      Thank you very much again for your helpful suggestion of English expression. We have adopted the sentence you suggested (Line 354-356).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript reveals important insights into the role of ipsilateral descending pathways in locomotion, especially following unilateral spinal cord injury. The study provides solid evidence that this method improves the injured side's ability to support weight, and as such the findings may lead to new treatments for stroke, spinal cord injuries, or unilateral cerebral injuries. However, the methods and results need to be better detailed, and some of the statistical analysis enhanced.

      Thank you for your assessment. We incorporated various text improvements in the final version of the manuscript to address the weaknesses you have pointed out. The specific improvements are outlined below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides potentially important new information about ipsilateral cortical impact on locomotion. A number of issues need to be addressed.

      Strengths:

      The primary appeal and contribution of this manuscript are that it provides a range of different measures of ipsilateral cortical impact on locomotion in the setting of impaired contralateral control. While the pathways and mechanisms underlying these various measures are not fully defined and their functional impacts remain uncertain, they comprise a rich body of results that can inform and guide future efforts to understand cortical control of locomotion and to develop more effective rehabilitation protocols.

      Weaknesses:

      (1) The authors state that they used a cortical stimulation location that produced the largest ankle flexion response (lines 102-104). Did other stimulation locations always produce similar, but smaller responses (aside from the two rats that showed ipsilateral neuromodulation)? Was there any site-specific difference in response to stimulation location?

      We derived motor maps in each rat, akin to the representation depicted in Fig 6. In each rat, alternative cortical sites did, indeed, produce distal or proximal contralateral leg flexion responses. Distal responses were more likely to be evoked in the rostral portion of the array, similarly to proximal responses early after injury. This distribution in responses across different cortical sites is reported in this study (Fig. 6) and is consistent with our prior work. The Results section has been revised to provide additional clarification of the passage you indicated and context for the data presented in Figure 6:

      On page 4, we have clarified: “Stimulation through these channels produced a strong whole-leg flexion movement, with an evident distal component. From visual inspection, all responding electrodes in the array produced contralateral leg flexion, although with different strength of contraction for a fixed stimulation intensity (100μA). Moreover, some sites did not present a distal movement component, failing in eliciting ankle flexion and resulting in a generally weaker proximal flexion.”

      On page 12, we have further noted: “By visually inspecting the responses elicited by stimulation delivered through each of the array electrodes, we categorized movements as proximal or distal. This classification was based on whether the ankle participated in the evoked response or if the movement was restricted to the proximal hindlimb. Each leg was scored independently.”

      (2) Figure 2: There does not appear to be a strong relationship between the percentage of spared tissue and the ladder score. For example, the animal with the mild injury (based on its ladder score) in the lower left corner of Figure 2A has less than 50% spared tissue, which is less spared tissue than in any animal other than the two severe injuries with the most tissue loss. Is it possible that the ladder test does not capture the deficits produced by this spinal cord injury? Have the authors looked for a region of the spinal cord that correlates better with the deficits that the ladder test produces? The extent of damage to the region at the base of the dorsal column containing the corticospinal tract would be an appropriate target area to quantify and compare with functional measures.

      In Fig. S6 of our 2021 publication "Bonizzato and Martinez, Science Translational Medicine", we investigated the predictive value of tissue sparing in specific sub-regions of the spinal cord for ladder performance. Among others, we examined the correlation between the accuracy of left leg ladder performance in the acute state and the preservation of the corticospinal tract (CST). Our results indicated that dorsal CST sparing serves as a mild predictor for ladder deficits, confirming the results obtained in this study.

      (3) Lines 219-221: The authors state that "phase-coherent stimulation reinstated the function of this muscle, leading to increased burst duration (90{plus minus}18% of the deficit, p=0.004, t-test, Fig. 4B) and total activation (56{plus minus}13% of the deficit, p=0.014, t-test, Fig. 3B). This way of expressing the data is unclear. For example, the previous sentence states that after SCI, burst duration decreased by 72%. Does this mean that the burst duration after stimulation was 90% higher than the -72% level seen with SCI alone, i.e., 90% + -72% = +18%? Or does it mean that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI, i.e., -72% * (100%-90%)= -7%? The data in Figure 4 suggests the latter. It would be clearer to express both these SCI alone and SCI plus stimulation results in the text as a percent of the pre-SCI results, as done in Figure 4.

      Your assessment is correct; we intended to report that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI. This point has been clarified (see page 9):

      “…leading to increased burst duration (recovered 90±18% of the lost burst duration, p=0.004, t-test, Fig. 4B) and total activation (recovered 56±13% of the total activation, p=0.014, t-test, Fig. 3B)”

      (4) Lines 227-229: The authors claim that the phase-dependent stimulation effects in SCI rats are immediate, but they don't say how long it takes for these effects to be expressed. Are these effects evident in the response to the first stimulus train, or does it take seconds or minutes for the effects to be expressed? After the initial expression of these effects, are there any gradual changes in the responses over time, e.g., habituation or potentiation?

      The effects are immediately expressed at the very first occurrence of stimulation. We never tested a rat completely naïve to stimuli, as each treadmill session involves prior cortical mapping to identify a suitable active site for involvement in locomotor experiments. Yet, as demonstrated in Supplementary Video 1 accompanying our 2021 publication on contralateral effects of cortical stimulation, "Bonizzato and Martinez, Science Translational Medicine," the impact of phase-dependent cortical stimulation on movement modulation is instantaneous and ceases promptly upon discontinuation of the stimulation. We did not quantify potential gradual changes in responsiveness over time, but we cannot exclude that for long stimulation sessions (e.g., 30 min or more), stimulus amplitude may need to be slightly increased over time to compensate habituation.

      (5) Awake motor maps (lines 250-277): The analysis of the motor maps appears to be based on measurements of the percentage of channels in which a response can be detected. This analytic approach seems incomplete in that it only assesses the spatial aspect of the cortical drive to the musculature. One channel could have a just-above-threshold response, while another could have a large response; in either case, the two channels would be treated as the same positive result. An additional analysis that takes response intensity into account would add further insight into the data, and might even correlate with the measures of functional recovery. Also, a single stimulation intensity was used; the results may have been different at different stimulus intensities.

      We confirm that maps of cortical stimulation responsiveness may vary at different stimulus amplitudes. To establish an objective metric of excitability, we identified 100µA as a reliable stimulation amplitude across rats and used this value to build the ipsilateral motor representation results in Figure 6. This choice allows direct comparison with Figure 6 of our 2021 article, related to contralateral motor representation. The comparison reveals a lack of correlation with functional recovery metrics in the ipsilateral case, in contrast to the successful correlation achieved in the contralateral case.

      Regarding the incorporation of stimulation amplitudes into the analysis, as detailed in the Method section (lines 770-771), we systematically tested various stimulation amplitudes to determine the minimal threshold required for eliciting a muscle twitch, identified as the threshold value. This process was conducted for each electrode site.

      Upon reviewing these data, we considered the possibility of presenting an additional assessment of ipsilateral cortical motor representation based on stimulation thresholds. However, the representation depicted in the figure did not differ significantly from the data presented in Figure 6A. Furthermore, this representation introduced an additional weakness, as it was unclear how to represent the absence of a response in the threshold scale. We chose to arbitrarily designate it as zero on the inverse logarithmic scale, where, for reference, 100 µA is positioned at 0.2 and 50 µA at 0.5.

      In conclusion, we believe that the conclusions drawn from this analysis align substantially with those in the text. The addition of the threshold analysis, in our assessment, would not contribute significantly to improving the manuscript.

      Author response image 1.

      Threshold analysis

      Author response image 2.

      Occurrence probability analysis, for comparison.

      (6) Lines 858-860: The authors state that "All tests were one-sided because all hypotheses were strictly defined in the direction of motor improvement." By using the one-sided test, the authors are using a lower standard for assessing statistical significance that the overwhelming majority of studies in this field use. More importantly, ipsilateral stimulation of particular kinds or particular sites might conceivably impair function, and that is ignored if the analysis is confined to detecting improvement. Thus, a two-sided analysis or comparable method should be used. This appropriate change would not greatly modify the authors' current conclusions about improvements.

      Our original hypothesis, drawn from previous studies involving cortical stimulation in rats and cats, as well as other neurostimulation research for movement restoration, posited a favorable impact of neurostimulation on movement. Consistent with this hypothesis, we designed our experiments with a focus on enhancing movement, emphasizing a strict direction of improvement.

      It's important to note that a one-sided test is the appropriate match for a one-sided hypothesis, and it is not a lower standard in statistics. Each experiment we conducted was constructed around a strictly one-sided hypothesis: the inclusion of an extensor-inducing stimulus would enhance extension, and the inclusion of a flexion-inducing stimulus would enhance flexion. This rationale guided our choice of the appropriate statistical test.

      We acknowledge your concern regarding the potential for ipsilateral stimulation to have negative effects on locomotion, which might not be captured when designing experiments based on one-sided hypotheses. That is, when hypothesizing that an extensor stimulus would enhance extension (a one-sided hypothesis) in a functional task, and finding an opposite result (inhibition), statistical rigor would impose that we cannot present that result as significant. This concern is valid, and we explicitly mentioned our design choice it in the method section, Quantification and statistical analyses:

      “All tests were one-sided, as our hypotheses were strictly defined to predict motor improvement. Specifically, we hypothesized that delivering an extension-inducing stimulus would enhance leg extension, and delivering a flexion-inducing stimulus would enhance leg flexion. Consequently, any potentially statistically significant result in the opposite direction (e.g., inhibition) would not be considered. However, no such occurrences were observed.”

      As a final note, even if such opposite observations were made, they could serve as the basis for triggering an ad-hoc follow-up study.

      Reviewer #1 also provided several detailed suggestions in the section “Recommendations for the authors”. We estimated that each of them was beneficial for the correctness or for the readability of the text, and thus all were incorporated into the final version.

      Reviewer #2 (Public Review):

      Summary:

      The authors' long-term goals are to understand the utility of precisely phased cortex stimulation regimes on recovery of function after spinal cord injury (SCI). In prior work, the authors explored the effects of contralesion cortex stimulation. Here, they explore ipsilesion cortex stimulation in which the corticospinal fibers that cross at the pyramidal decussation are spared. The authors explore the effects of such stimulation in intact rats and rats with a hemisection lesion at the thoracic level ipsilateral to the stimulated cortex. The appropriately phased microstimulation enhances contralateral flexion and ipsilateral extension, presumably through lumbar spinal cord crossed-extension interneuron systems. This microstimulation improves weight bearing in the ipsilesion hindlimb soon after injury, before any normal recovery of function would be seen. The contralateral homologous cortex can be lesioned in intact rats without impacting the microstimulation effect on flexion and extension during gait. In two rats ipsilateral flexion responses are noted, but these are not clearly demonstrated to be independent of the contralateral homologous cortex remaining intact.

      Strengths:

      This paper adds to prior data on cortical microstimulation by the laboratory in interesting ways. First, the strong effects of the spared crossed fibers from the ipsi-lesional cortex in parts of the ipsi-lesion leg's step cycle and weight support function are solidly demonstrated. This raises the interesting possibility that stimulating the contra-lesion cortex as reported previously may execute some of its effects through callosal coordination with the ipsi-lesion cortex tested here. This is not fully discussed by the authors but may represent a significant aspect of these data. The authors demonstrate solidly that ablation of the contra-lesional cortex does not impede the effects reported here. I believe this has not been shown for the contra-lesional cortex microstimulation effects reported earlier, but I may be wrong. Effects and neuroprosthetic control of these effects are explored well in the ipsi-lesion cortex tests here.

      In the revised version of the manuscript, we incorporated various text improvements to address the points you have highlighted in your review. Additionally, we have integrated the suggested discussion topic on callosal coordination related to contralateral cortical stimulation. The discussion section now incorporates:

      “Since bi-cortical interactions in sculpting descending commands are known (Brus-Ramer et al., 2009), and in light of the changes we report in ipsilesional motor cortex excitability, the role of the ipsilateral cortex in mediating or supporting functional descending commands from the contralateral cortex, particularly the immediate increase in flexion of the affected hindlimb and long-term recovery of functional control (Bonizzato & Martinez, 2021), could be further explored.”

      The localization of the specific channels closest to the interhemispheric fissure (Fig. 7D) may suggest the involvement of transcallosal interactions in mediating the transmission of the cortical command generated in the ipsilateral motor cortex (Brus-Ramer, Carmel, & Martin, 2009). “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Weaknesses:

      Some data is based on very few rats. For example (N=2) for ipsilateral flexion effects of microstimulation. N=3 for homologous cortex ablation, and only ipsi extension is tested it seems. There is no explicit demonstration that the ipsilateral flexion effects in only 2 rats reported can survive the contra-lateral cortex ablation.

      We agree with this assessment. The ipsilateral flexion representation is here reported as a rare but consistent phenomenon, which we believe to have robustly described with Figure 7 experiments. We underlined in the text that the ablation experiment did not conclude on the unilateral-cortical nature of ipsilateral flexion effects, by replacing the sentence with the following:

      “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Some improvements in clarity and precision of descriptions are needed, as well as fuller definitions of terms and algorithms.

      Likely Impacts: This data adds in significant ways to prior work by the authors, and an understanding of how phased stimulation in cortical neuroprosthetics may aid in recovery of function after SCI, especially if a few ambiguities in writing and interpretation are fully resolved.

      The manuscript text has been revised in its final version, and we sought to eliminate all ambiguity in writing and data interpretation.

      In the section “Recommendations for the authors” Reviewer #2 also suggested to better define multiple terms throughout the manuscript. A clarification was added for each.

      The Reviewer pointed out that we might have overlooked a correlation between locomotor recovery and motor maps increase in Figure 6. We re-approached this evaluation and found that the reviewer is correct. We were led to think that there was no correlation by “horizontally” looking at whether motor map size across rats would predict locomotor scores (as it did in the case of contralateral cortex mapping, Bonizzato and Martinez, 2021). However we now found a strong correlation between changes that happen over time for each rat and locomotor recovery, a result that was only hinted with no appropriate quantification in the previous version of the manuscript. We have now reformulated the results of Figure 6 on page 12, to include this result, and we would like to thank the reviewer for having noticed this opportunity.

      Finally, we have expanded the discussion to include the following points:

      The possibility that hemi-cortex coordination of contralesional microstimulation inputs may explain the Sci Transl Med results for contralesional cortex ICMS, which warrants further investigation.

      The recognition that the ablation experiments do not provide conclusive evidence regarding ipsilateral flexion control and whether an alternative efferent pathway might be involved in this specific case.

      Reviewer #3 (Public Review):

      Summary:

      This article aims to investigate the impact of neuroprosthesis (intracortical microstimulation) implanted unilaterally on the lesion side in the context of locomotor recovery following unilateral thoracic spinal cord injury.

      Strength:

      The study reveals that stimulating the left motor cortex, on the same side as the lesion, not only activates the expected right (contralateral) muscle activity but also influences unexpected muscle activity on the left (ipsilateral) side. These muscle activities resulted in a substantial enhancement in lift during the swing phase of the contralateral limb and improved trunk-limb support for the ipsilateral limb. They used different experimental and stimulation conditions to show the ipsilateral limb control evoked by the stimulation. This outcome holds significance, shedding light on the engagement of the "contralateral projecting" corticospinal tract in activating not only the contralateral but also the ipsilateral spinal network.

      The experimental design and findings align with the investigation of the stimulation effect of contralateral projecting corticospinal tracts. They carefully examined the recovery of ipsilateral limb control with motor maps. They also tested the effective sites of cortical stimulation. The study successfully demonstrates the impact of electrical stimulation on the contralateral projecting neurons on ipsilateral limb control during locomotion, as well as identifying important stimulation spots for such an effect. These results contribute to our understanding of how these neurons influence bilateral spinal circuitry. The study's findings contribute valuable insights to the broader neuroscience and rehabilitation communities.

      Thank you for your assessment of this manuscript. The final version of the manuscript incoporates your suggestions for improving term clarity and we enhanced the discussion on the mechanisms of spinal network engagement, as outlined below.

      Weakness:

      The term "ipsilateral" lacks a clear definition in the title, abstract, introduction, and discussion, potentially causing confusion for the reader.

      [and later] However, in my opinion, readers can easily link the ipsilateral cortical network to the ipsilateral-projecting corticospinal tract, which is less likely to play a role in ipsilateral limb control in this study since this tract is disrupted by the thoracic spinal injury.

      In order to mitigate the risk of having readers linking the effects of ipsilateral cortical stimulation with ipsilateral-projecting corticospinal tract, we specified:

      In the abstract, we precise that our goal was: “to investigate the functional role of the ipsilateral motor cortex in rat movement through spared contralesional pathways.”

      In the introduction: “In most cases, this lesion also disrupts all spinal tracts descending on the same side as the cortex under investigation at the thoracic level, meaning that the transmission of cortical commands to the ipsilesional hindlimb must depend on crossed descending tracts (Fig. S1).”

      The unexpected ipsilateral (left) muscle activity is most likely due to the left corticospinal neurons recruiting not only the right spinal network but also the left spinal network. This is probably due to the joint efforts of the neuroprosthesis and activation of spinal motor networks which work bilaterally at the spinal level.

      We agree with your assessment and the discussion section now emphasizes the effects of supraspinal drive onto spinal circuits.

      In the section “Recommendations for the authors” Reviewer #3 suggested to provide an early reminder to the reader that the focus is on exploring the control of the ipsilateral limb through the corticospinal tract of the same side, projecting contralaterally. We did so in the abstract and introduction, as presented above.

      The reviewer also suggested that the discussion could be shorter. While we recognize it covers diverse subjects that may appeal to different readers, we believe omitting some sections could limit its overall scope. The manuscript underwent three revisions and a thorough dialogue with reviewers from diverse backgrounds, and we are hesitant to undo some of these improvements.

      Moreover, the section falls short of fully exploring the involvement of contralateral projecting corticospinal neurons in spinal networks for diverse motor behaviors. It could potentially delve into aspects like the potential impact of corticospinal inputs on gating the cross-extensor reflex loop and elucidating the mechanisms underlying the recruitment of the ipsilateral spinal network for generating ipsilateral limb movements. Is it a direct control on motor neurons or via existing spinal circuits?

      The discussion section now includes the potential spinal circuits through which corticospinal neurons may affect motor control and reflexes.

      Reviewer #3 also provided several detailed suggestions in the sub-section “Minor points”. We estimated that all of them were beneficial for the correctness or for the readability of the text, and thus were incorporated into the final version. Some of the questions raised were answered directly in the text (defining “% of chronic map” and rephrasing the original Line 479). We would like to answer here below two remaining questions:

      Fig. 3C I wonder what is the average latency between stimulation onset and onset of right ankle flexor activity. Is the latency fixed, or variable (which probably indicates that the Cortical activation signal is integrated with spinal CPG activity.)

      ICMS trains, unfortunately, do not allow for precise dissection of transmission timing. Single pulses at 100 µA are insufficient to generate motoneuron responses and require multiple pulses to build up cortical transmission. Alstermark et al. (Journal of Neurophysiology, 2004) used two to four stimuli with higher amplitudes to investigate forelimb transmission timing. In our 2021 Science Translational Medicine paper, we employed single pulses at 1 mA to establish transmission delays from the contralateral cortex to the ankle flexor. However, the circuits recruited at 1 mA are not directly comparable to those activated by shorter trains.

      In this study, we used cortical trains of approximately 14 pulses, typical of ICMS protocols. Each pulse could potentially be the first to generate a response volley in the ankle flexor, with delays measured at 30 to 60 ms from ICMS train onset. While we believe that cortical commands are necessarily integrated with spinal CPG activity—as indicated in Figures 1B and 3D, where timing is crucial and descending commands can be gated out if delivered off-phase—the variability in latency that we recorded could be attributed to any of the following factors: cortical activation build-up, integration within reticular relay networks, or CPG integration.

      Fig. 4A. Why is the activity of under contralateral ankle flexor intact condition is later than the stimulation condition?

      We timed the stimulation to coincide with the contralateral leg lift and did not adjust its onset relative to spontaneous walking in SCI rats. Although stimulation could induce leg lift, as shown in Fig. 4A, SCI rats exhibited a slightly earlier and stronger activation of the right (contralateral) ankle flexor muscle even during spontaneous walking. This phenomenon is attributed to the deficits observed on the left side. The stronger right leg bears the body weight, as illustrated in Fig. 3, and thus, during body advancement, the right leg is engaged sooner and more rapidly (with a shorter swing phase) to provide support (right foot forward).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In Ryu et al., the authors use a cortical mouse astrocyte culture system to address the functional contribution of astrocytes to circadian rhythms in the brain. The authors' starting point is transcriptional output from serum-shocked culture, comparative informatics with existing tools and existing datasets. After fairly routine pathway analyses, they focus on the calcium homeostasis machinery and one gene, Herp, in particular. They argue that Herp is rhythmic at both mRNA and protein levels in astrocytes. They then use a calcium reporter targeted to the ER, mitochondria, or cytosol and show that Herp modulates calcium signaling as a function of circadian time. They argue that this occurs through the regulation of inositol receptors. They claim that the signaling pathway is clock-controlled by a limited examination of Bmal1 knockout astrocytes. Finally, they switch to calcium-mediated phosphorylation of the gap junction protein Connexin 43 but do not directly connect HERP-mediated circadian signaling to these observations. While these experiments address very important questions related to the critical role of astrocytes in regulating circadian signaling, the mechanistic arguments for HERP function, its role in circadian signaling through inositol receptors, the connection to gap junctions, and ultimately, the functional relevance of these findings is only partially substantiated by experimental evidence. 

      Strengths: 

      - The paper provides useful datasets of astrocyte gene expression in circadian time. 

      - Identifies HERP as a rhythmic output of the circadian clock. 

      - Demonstrates the circadian-specific sensitivity of ATP -> calcium signaling. 

      - Identifies possible rhythms in both Connexin 43 phosphorylation and rhythmic movement of calcium between cells. 

      Weaknesses: 

      - It is not immediately clear why the authors chose to focus on Ca2+ homeostasis or Herp from their initial screens as neither were the "most rhythmic" pathways in their primary analyses. 

      We appreciate the reviewer’s comment. We chose to focus on Ca2+ homeostasis processes because intracellular Ca2+ signaling plays crucial role in numerous astrocyte functions and is notably associated with sleep/wake status of animals, which is our primary interest (Bojarskaite et al., 2020; Ingiosi et al., 2020; Blum et al., 2021; Szabó et al., 2017). Among the genes involved in calcium ion homeostasis, Herp exhibited the most robust rhythmicity (supplementary table 1). The rationale for our focus on Ca2+ homeostasis and Herp is explained in the results section (line 143-150). We hope this provides a clear justification for our focus.

      - It would have been interesting (and potentially important) to know whether various methods of cellular synchronization would also render HERP rhythmic (e.g., temperature, forskolin, etc). If Herp is indeed relatively astrocyte-specific and rhythmic, it should be easy to assess its rhythmicity in vivo. 

      Thank you for the reviewer’s insightful comment. In response, we examined HERP expression in cultured astrocytes synchronized using either Dexamethasone or Forskolin treatment. We found that Herp exhibited rhythmic expression at both the the mRNA and protein levels under these conditions. These results have been added to Figure S3 and are explained in the manuscript (lines 173-175).

      Additionally, we measured HERP levels in the prefrontal cortex of mice at CT58 and CT70 and found no rhythmicity, as shown in Author response image 1. Given that Herp is expressed in various brain cell types, including microglia, endothelial cells, neurons, oligodendrocytes, and the astrocytes- with the highest expression in microglia(Cahoy et al., 2008), we reason that the potential rhythmic expression of HERP in astrocytes might be masked by its continuous expression in other cell types. Nonetheless, to assess HERP rhythmicity specifically in astrocytes in vivo, we attempted immunostaining using several anti-HERP antibodies, but none were successful. Consequently, we were unable to determine whether HERP exhibits rhythmic expression in astrocytes in vivo.

      Author response image 1.

      HERP levels were constant at CT58 and CT70. (A, B) Mice were entrained under 12h:12h LD cycle and maintained in constant dark. Prefrontal cortices were harvested at indicated time and processed for Western blot analysis. Representative image shows three independent samples. (B) Quantification of HERP levels normalized to VINCULIN. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; t-test)

      - The authors show that Herp suppression reduces ATP-mediated suppression of calcium whereas it initially increases Ca2+ in the cytosol and mitochondria and then suppresses it. The dynamics of the mitochondrial and cytosolic responses are not discussed in any detail and it is unclear what their direct relationship is to Herp-mediated ER signaling. What is the explanation for Herp (which is thought to be ER-specific) to calcium signaling in other organelles? 

      Our examination of cytosolic and mitochondrial Ca2+ responses was aimed at corroborating HERP’s effect on ER Ca2+ response. Upon ATP stimulation, Ca2+ is released from the ER via IP3R receptors (IP3Rs) and subsequently transmitted to other organelles including mitochondria (Carreras-Sureda et al., 2018; Giorgi et al., 2018). Ca2+ is directly transferred to the cytosol by IP3Rs located on the ER membrane, and to the mitochondria through a complex formed by IP3R and the voltage-dependent anion channel (VDAC) on the mitochondria (Giorgi et al., 2018).  Consistent with previous reports, we observed an increase of cytosolic and mitochondrial Ca2+ levels accompanied by decrease in ER Ca2+ levels following ATP treatment (See Fig. 3B, E, H, control siRNA). The ATP-stimulated ER Ca2+ release was enhanced by Herp knockdown. We reasoned that if Ca2+ release was enhanced, then cytosolic and mitochondrial Ca2+ uptakes would also be enhanced. The results were consistent with our hypothesis (See Fig. 3B, E, H, Herp siRNA). These observations are described in the Results section (lines 202-208) and in the Discussion (lines 333-348). We hope this explanation clarifies the relationship between Herp-mediated ER Ca2+ response and Ca2+ response in other organelles. Thank you for your consideration.

      - What is the functional significance of promoting ATP-mediated suppression of calcium in ER? 

      In astrocytes, intracellular Ca2+ plays crucial role in regulating several processes. In this study, among various downstream effects of intracellular Ca2+, we examined the gap junction channel (GJC) conductance, which affects astrocytic communication. As discussed in the manuscript (lines 357-381), circadian variation in HERP results in rhythmic Cx43 (S368) phosphorylation linked with GJC conductance. We propose that during the subjective night phase, heightened ATP induced ER Ca2+ release reduces GJC conductance, uncoupling astrocytes from the syncytium, making them better equipped for localized response. On the other hand, during the subjective day phase, increased GJC conductance may allow astrocytes to control a larger area for synchronous neuronal activity which is a key feature of sleep.

      - The authors then nicely show that the effect of ATP is dependent on intrinsic circadian timing but do not explain why these effects are antiphase in cytosol or mitochondria.

      Moreover, the ∆F/F for calcium in mitochondria and cytosol both rise, cross the abscissa, and then diminish - strongly suggesting a biphasic signaling event. Therefore, one wonders whether measuring the area under the curve is the most functionally relevant measurement of the change. 

      We appreciate the reviewer’s insightful comments. As explained in our previous response, Ca2+ released from the ER is transferred to the cytosol and mitochondria. This transfer explains why the fluorescent intensities of cytosolic and mitochondrial Ca2+ indicators show anti-phasic responses to those of the ER.

      We agree that cytosolic and mitochondrial Ca2+ responses may be biphasic. The decrease below the abscissa in mitochondria and cytosol likely reflects Ca2+ extrusion from these organelles. However, our primary focus was on the initial uptake of Ca2+ following ER Ca2+ release. Thus, when calculating the area under the curve (AUC), we measured the area between the ∆F/F graph and the y=0 (X-axis) for both mitochondria and cytosol. We reason that the measuring the area under the curve (above the abscissa) fits with our objective.

      While addressing your concerns, we noticed errors in the Y-axis labels of Fig. 3C, 4D, and 5C. For the ER Ca2+ dynamics, we measured the area above curve. These mistakes have now been corrected.

      - Why are mitochondrial and cytosolic calcium not also demonstrated for Bmal1 KO astrocytes? 

      In two sets of experiments (Fig. 3 and Fig. 4), we demonstrated that the increase in cytosolic and mitochondrial Ca2+ aligns with ER Ca2+ release. Since there were no circadian time differences in ER Ca2+ release in the Bmal1 KO cultures, we concluded that it was unnecessary to measure Ca2+ levels in the mitochondria and cytosol. Additionally, our primary focus is on the ER Ca2+ response rather than the Ca2+ dynamics in subcellular organelles. We hope this clarifies our rationale and maintains the focus of our study.

      - The authors claim that Herp acts by regulating the degradation of ITPRs but this hypothesis - rather central to the mechanisms proposed in this study - is not experimentally substantiated. 

      We appreciate the reviewer’s insightful comments regarding the role of HERP in the degradation of IP3Rs. In the original manuscript, we demonstrated that treating cells with Herp siRNA leads to an increase in the levels of ITPR1 and ITPR2, suggesting that HERP might be involved in the regulation of IP3Rs stability. This observation is consistent with previous studies, which showed that Herp siRNA treatment increases ITPR levels in HeLa and cardiac cells (Paredes et al., 2016; Torrealba et al., 2017). Torrealba et al. also showed that HERP regulates the polyubiquitination of IP3Rs. Based on our results and previous reports, we hypothesized that HERP similarly regulates ITPR degradation in cultured astrocytes.

      However, as the reviewer rightly pointed out, further evidence is needed to confirm that HERP specifically regulates ITPR degradation. To address this, we conducted new experiments examining the effect of XesC, an inhibitor of IP3Rs, on ER Ca2+ release. The treatment of XesC reduced the ER Ca2+ release and abolished the enhancement of ER Ca2+ release by Herp KD. These results demonstrated that HERP influences ER Ca2+ response through IP3Rs. These new findings have been added to Fig. 3N – 3P and explained in the Results section (lines 217-221).

      We believe these additional experiments and clarifications strengthen our hypothesis that HERP regulates IP3R degradation, thereby modulating ER Ca2+ responses.

      - There is no clear demonstration of the functional relevance of the circadian rhythms of ATP-mediated calcium signaling.

      As mentioned in the previous response, we examined Cx43 phosphorylation linked with GJC conductance in the context of ATP-mediated Ca2+ signaling. Our results demonstrated circadian variations in Cx43 Ser368 phosphorylation leading to variations of gap junction channel (GJC) conductance (Fig. 6C – F and Fig. 7D - I). We have discussed the significance of this circadian rhythm in ATP driven ER Ca2+ signaling concerning astrocytic function during sleep/wake states in the manuscript (lines 357 – 382) as follows.

      “ATP-stimulated Cx43 (S368) phosphorylation is higher at 30hr (subjective night phase) than at 42hr (subjective day phase) (Fig. 6C and 6D.), a finding further supported by in vivo experiments showing higher pCx43(S368) levels in the prefrontal cortex during the subjective night than during the day (Fig. 6E and 6F). What are the implications of this day/night variation in Cx43 (S368) phosphorylation? We reasoned that the circadian variation in Cx43 phosphorylation could significantly impact astrocyte functionality within the syncytium. Indeed, our cultured astrocytes exhibited circadian phase-dependent variation in gap junctional communication (Fig.7D – 7F). Astrocytes influence synaptic activity through the release of gliotransmitters such as glutamate, GABA, D-serine, and ATP, triggered by increases in intracellular Ca2+ in response to the activity of adjacent neurons and astrocytes (Verkhratsky & Nedergaard, 2018). Importantly, this increase in Ca2+ spreads to adjacent astrocytes through GJCs (Fujii et al., 2017), influencing a large area of the neuronal network. Considering that Cx43 Ser368 phosphorylation occurs to uncouple specific pathways in the astrocytic syncytium to focus local responses (Enkvist & McCarthy, 1992), our findings suggest that astrocytes better equipped for localized responses when presented with a stimulus during the active phase in mice. Conversely, during the rest period, characterized by more synchronous neuronal activity across broad brain areas (Vyazovskiy et al., 2009) higher GJC conductance might allow astrocytes to exert control over a larger area. In support of this idea, recent study showed that synchronized astrocytic Ca2+ activity advances the slow wave activity (SWA) of the brain, a key feature of non-REM sleep (Szabó et al., 2017). Blocking GJC was found to reduce SWA, further supporting this interpretation. However, conflicting findings have also been reported. For instance, Ingiosi et al. (Ingiosi et al., 2020) found that astrocytic synchrony was higher during wakefulness than sleep in the mouse frontal cortex. Whether these differing results in astrocyte synchrony during resting and active periods are attributable to differences in experimental context (e.g., brain regions, sleep-inducing condition) remains unclear. Indeed, astrocyte Ca2+ dynamics during wakefulness/sleep vary according to brain regions (Tsunematsu et al., 2021). While the extent of astrocyte synchrony might differ depending on brain region and/or stimulus, on our results suggest that the baseline state of astrocyte synchrony, which is affected by GJC conductance, varies with the day/night cycle.”

      Reviewer #2 (Public Review): 

      Summary: 

      The article entitled "Circadian regulation of endoplasmic reticulum calcium response in mouse cultured astrocytes" submitted by Ryu and colleagues describes the circadian control of astrocytic intracellular calcium levels in vitro. 

      Strengths: 

      The authors used a variety of technical approaches that are appropriate 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript.

      Weaknesses: 

      Statistical analysis is poor and could lead to a misinterpretation of the data 

      Thank you for the comment. We have carefully reviewed our statistical analyses and applied appropriate methods where necessary. Please see below for the specific revisions and improvements made.

      For Fig. 2D-E, we initially used a t-test. However, after adding more replicates and conducting a normality test, we found that the data did not follow a normal distribution. Therefore, we switched to the Mann-Whitney U test. In Fig. 5D-E, we originally used a repeated measures two-way ANOVA, but we have now changed it to a standard two-way ANOVA. For Fig. 7C and I, we also observed non-normal distribution in the normality test and consequently replaced the t-test with the Mann-Whitney U test. For other analyses not specifically mentioned, normality tests confirmed normal distribution, allowing us to use t-tests or ANOVA as appropriate for statistical analysis.

      Several conceptual issues have been identified. 

      We have addressed the reviewer’s concerns. Please see our detailed point-by-point responses below.

      Overinterpretation of the data should be avoided. This is a mechanistic paper done completely in vitro, all references to the in vivo situation are speculative and should be avoided. 

      We appreciate the reviewer’s insightful comment. Following the reviewer’s suggestion, we have removed the interpretations of GO pathways in the context of in vivo situation.

      Reviewer #3 (Public Review): 

      Astrocyte biology is an active area of research and this study is timely and adds to a growing body of literature in the field. The RNA-seq, Herp expression, and Ca2+ release data across wild-type, Bmal1 knockout, and Herp knockdown cellular models are robust and lend considerable support to the study's conclusions, highlighting their importance. Despite these strengths, the manuscript presents a gap in elucidating the dynamics of HERP and the involvement of ITPR1/2 in modulating Ca2+ release patterns and their circadian variations, which remains insufficiently supported and characterized. While the Connexin data underscore the importance of rhythmic Ca2+ release triggered by ATP, the relationship here appears correlational and the role of HERP and ITPR in Cx function remains to be characterized. Moreover, enhancing the manuscript's clarity and readability could significantly benefit the presentation and comprehension of the findings. 

      We appreciate the reviewer’s acknowledgement of the strengths of our manuscript. Regarding the identified gaps, we have conducted several new experiments to clearly demonstrate the HERP-ITPR-Cx phosphorylation axis. Please see our detailed point-by-point responses below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      - While HERP appears to be a clock-controlled gene and its protein levels appear to demonstrate rhythmicity as well, the data quality of the western blotting in Bmal1 knockout raises some concern about the accuracy of HERP protein quantification. 

      We understand the reviewer’s concern regarding the proximity of the HERP band to a nonspecific band in the Western blotting for the Bmal1 knockout. However, we took great care to ensure the accuracy of our HERP band quantification. We meticulously selected only the specific HERP band, excluding nonspecific band. Therefore, we are confident in the accuracy of our HERP protein measurements.

      - If HERP is rhythmic and ITPRs are not, if their model is correct, might we expect HERP suppression to result in 'unmasking' an ITPR rhythm? 

      Our model suggests that both HERP and ITPRs are rhythmic, with HERP regulating the degradation of ITPR proteins and driving their rhythms. Consistent with this, we observed that day/night variations in ITPR2 levels (Fig. 4N and 4O). Therefore, we concluded that circadian variations in HERP are sufficient to drive ITPR2 rhythms. We have explained this in detail in the Result section (lines 236-241) and the Discussion section (lines 324-332).

      - The authors make a rather abrupt switch to examining gap junctions and connexin 43 phosphorylation. While the data demonstrating that the phosphorylation of S368 may indeed be rhythmic - the authors do not connect these data to the rest of the manuscript by showing a connection to HERP-mediated calcium signaling, limiting the coherence of the narrative. 

      Thank you for the reviewer’s insightful comments. To address the reviewer's concern regarding the connection between Herp and the phosphorylation of CX43 at S368, we have conducted new experiments to test whether KD of Herp abolishes the rhythms of Cx43 phosphorylation at S368. We found that the phosphorylation of Cx43 at S368 is significantly enhanced at 30hrs post sync compared with 42hrs post sync in control siRNA-treated astrocytes consistent with our previous results (Fig. 6C & 6D). On the other hand, this circadian phase dependent difference in phosphorylation was abolished in Herp siRNA treated astrocytes. These results clearly indicate that circadian variations in Cx43 phosphorylation are driven by the HERP. These new results are now included in Fig. 6G and 6H and explained in the Results section (lines 276-281).

      - Comment on data presentation: the authors repeatedly present histograms with attached lines between data points - from my understanding of the experiments, this is inappropriate unless these were repeated measures from the same cells. Otherwise, the lines connecting one data point to another between different conditions (e.g., Ctrl or Herp knockdown) are arbitrary and possibly misleading (i.e., Figure 3K, 3M, 4L, 6D). 

      Thank you for the reviewer’s comment. We have updated the figures by removing the lines connecting data points in the relevant figures (Fig.3K, M, Fig4.N and Fig.6D)).

      Reviewer #2 (Recommendations For The Authors): 

      Most of the suggestions of this reviewer are related to the conceptual interpretation and presentation of the data and to the statistical analysis 

      In Figure 1 the authors analyzed the rhythmic transcriptome of cortical astrocytes synchronized with a serum shock in two different ways. The authors need to discuss what is the difference between the two methods used to detect rhythmic transcripts and make sense of them. 

      Following the reviewer’s suggestion, we have provided a more detailed explanation about MetaCycle and BioCycle, as well as the rationale for using both packages in our analysis as follows: “Various methods have been used to identify periodicity in time-series data, such as Lomb-Scargle (Glynn et al., 2006), JTK_CYCLE (Hughes et al., 2010) and ARSER (Yang & Su, 2010), each with distinct advantages and limitations. MetaCycle, integrates these three methods, facilitating the evaluation of periodicity in time-series data without requiring the selection of an optimal algorithm (Wu et al., 2016). Additionally, BioCycle has been developed using a deep neural network trained with extensive synthetic and biological time series datasets (Agostinelli et al., 2016). Because MetaCycle and Biocycle identify periodic signal based on different algorithms, we applied both packages to identify periodicity in our time-series transcriptome data. BioCycle and MetaCycle analyses detected 321 and 311 periodic transcripts, respectively (FDR corrected, q-value < 0.05) (Fig. 1B). Among these, 220 (53.4%) were detected by both methods, but many transcripts did not overlap. MetaCycle is known for its inability to detect asymmetric waveforms (Mei et al., 2020). In our analysis, genes with increasing waveforms like Adora1 and Mybph were identified as rhythmic only by BioCycle, while Plat and Il34 were identified as rhythmic only by MetaCycle (Fig. S1C). Despite these discrepancies, the clear circadian rhythmic expression profiles of these genes led us to conclude that using the union of the two lists compensates for the limitations of each algorithm.”

      Please refer to lines 105-117 in the Results section.

      The reasoning for comparing CT0 with the phase of the clock 8 hs after SS needs to be explained. Circadian time (CT) conceptually refers to the clock phase in the absence of entrainment cues in vivo, the direct transformation of "time after synchronization" in vitro to CT is misleading. 

      Thank you for the reviewer’s insightful comments. Initially, we believed that transforming TASS to CT, despite being in vitro data, might provide a more intuitive and physiologically relevant interpretation of our results. However, we agree that this approach might be misleading. Following the reviewer’s suggestion, we have revised our terminology by changing “CT” to “Time post sync (hr)”. Nonetheless, in Fig. 1F for circular peak phase map, we set 8hrs post sync to ZT0 based on a phase comparison result in Fig. 1D for physiologically relevant interpretation. We hope these revisions clarify our approach.

      Moreover, also by definition a CT cannot be defined in terms of "dark" or "light". Figure 6M needs to be changed. 

      Following the reviewer’s suggestion, we removed the labels CT22 and CT34. Instead. we have labeled the respective periods as “30hr post sync” and “42hr post sync”.

      In Figure 1D, the authors present a gene ontology analysis that is certainly interesting, however, it should not be overinterpreted when trying to explain processes that take place only in vivo (e.g. wound repair). 

      Thank you for the insightful comment. Following the reviewer’s feedback, we have removed the paragraph interpreting the cell migration process in relation to wound repair and have focused instead on Ca2+ ion homeostasis.

      In Figure 2A the relative expression of clock genes and Herp is again misleading by a white/grey shading indicating subjective night and subjective day when the system under study is a cell culture. 

      We understand the reviewer’s concern that a cell culture system is not equivalent to light/dark entrainment condition. However, we apply time-synchronizing stimuli to recapitulate in vivo entrainment. In addition, by comparing our data with CircaDB, we defined 8hrs post sync as corresponding to ZT0, thus aligning it with the beginning of the day. We have retained the shading to facilitate easier interpretation of our data in relation to in vivo situations. However, in response to the reviewer’s concern, we have revised the shading from white/grey to light grey/dark grey. We hope this adjustment addresses the reviewer’s concern, but if the reviewer still believes it is inappropriate, please let us know, we will gladly update it.

      In the Figure 2A legend, it is indicated that rhythmicity is assessed using MetaCycle with mean values obtained from n=2. The authors need to make clear whether this n=2 mean: 2 biological replicates or 2 technical replicates. This difference is relevant because it would make the analysis statistically valid or invalid, respectively. 

      Thank you for your feedback. n=2 refers to 2 biological replicates. Therefore, the analysis is statistically valid.

      In Figures 2C and D the authors applied a T-test, a parametric statistical test for one-to-one comparison that requires normality distribution of the data to be tested first. To test normality, the authors need at least 4 biological replicates. The suggestion of this reviewer is that these experiments have to be repeated and proper statistics applied. 

      Thank you for your feedback. In response to the reviewer's suggestion, we conducted additional experiments to increase the number of biological replicates to 4. After verifying the normality of the data, we applied a t-test for Figure 2C and a Mann-Whitney test for Figure 2D and 2E. These tests confirmed significant statistical difference between groups.

      Further evidence of Bmal1-dependent control of HERP circadian expression authors could check the presence of E-Box elements in the Herp promoter. 

      Thank you for the reviewer’s insightful comment. In the original version of our manuscript's Discussion section, we mentioned the absence of a canonical E-Box in the upstream of Herp gene. However, following the reviewer’s suggestion and considering the potential role of non-canonical E-Boxes, we conducted an additional analysis. This analysis identified several non-canonical E-Boxes within the 6 kb upstream region of the Herp gene (Table S2). Notably, we found one non-canonical E-Box, “CACGTT,” known to regulate circadian expression (Yoo et al., 2005) is close to the transcription start site (chr8:94386194-94386543). Moreover, this element is evolutionarily conserved across various mammals, including humans, rats, mice, dogs, and opossums (See Author response image 2). Therefore, we reasoned that these non-canonical E boxes might drive the CLOCK/BMAL1 dependent expression of Herp. We have updated the Discussion to reflect these findings in lines 315-319.

      Author response image 2.

      The calcium experiments shown in Figures 3A-I, could be more convincing if the authors showed that the different Ca2+ sensors are compartment-specific by showing co-localization with a subcellular marker. In the pictures shown it is not even possible to recognize the cell dimensions. 

      Following the reviewer’s suggestion, we performed co-staining experiments with organelle specific Ca2+ indicators and organelle markers. First, astrocytes were co-transfected with G-CEPIA1er, an ER specific Ca2+ indicator and ER targeted DsRed2 (with Calreticulin signal sequence). Live imaging analysis showed that the fluorescent intensities of G-CEPIA1er and DsRed2-ER-5 significantly overlapped in co-transfected cells. Secondly, astrocytes were transfected with Mito-R-GECO1 and Mitotracker, a cell permeable mitochondria dye, was applied. The fluorescent intensities of Mito-R-GECO1 and Mitotracker also significantly overlapped. These new data are included in Figure S4 and explained in the Result section (lines 194-195).

      Data analysis in Figure 3 K and M is misleading. According to the explanations of the results, each of the experiments to assess ITRP1 or 2 is run independently. Then it is not clear why the relative levels obtained with control or Herp siRNA are plotted as pairs. Same comment as above for Figure 4L and Figure 6D. 

      Thank you for the reviewer’s insightful comments. Reviewer1 raised similar issues. Following the reviewers’ suggestions, we have removed the lines connecting the data points in Fig. 3K, 3M, 4L, and 6D.

      In Figure 5E the authors need to explain why they consider that repeated measures 2-way ANOVA is the right statistical test to apply. According to the explained experimental design, cells transfected, synchronized, and then harvested independently at the indicated time after synchronization. 

      Thank you for the reviewer’s insightful comment. Upon reviewing the statistical methods as suggested, we have revised our approach. Instead of using repeated measures 2-way ANOVA, we have now applied a standard 2-way ANOVA, which is more appropriate given the experimental procedures were independent, as the reviewer pointed out.

      The English language needs to be revised throughout the text. 

      We have thoroughly revised the English language throughout the text.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Figure 3. Clarify the physiological importance of 100 µM ATP. Would the Herp rhythm warrant Ca2+ release rhythms under basal conditions? In 3J-K, the relatively weak effect of Herp knockdown on ITPR1/2 levels, albeit statistically significant, may not be physiologically significant. This calls into question the claimed Herp-ITPR axis that underlies the Ca2+ release phenotype. Further, the correlation certainly exists but further characterization of Herp KD cells would be required to address the mechanism. 

      As previously reported, a broad range of ATP concentrations can induce Ca2+ activity in the astrocytes (Neary et al., 1988). Originally, we conducted an ATP dose-response analysis to observe ER Ca2+ release in our primary astrocyte culture. Our results show that ER Ca2+ release begins at 50 µM ATP and plateaus at 500 µM. Please refer to Author response image 3. We selected 100µM ATP for our experiments because it induces a medium level of ER Ca2+ response. Importantly, although measuring ATP concentrations at the synapse in vivo is challenging(Tan et al., 2017), estimates suggest synaptic ATP concentrations range from 5-500 µM (Pankratov et al., 2006). Thus, 100µM ATP is a physiologically relevant concentration that can affect nearby cells, including astrocytes, in the nervous system.

      Author response image 3.

      Cultured astrocytes were transfected with G-CEPIA1er ER and at 48hrs post transfection, cultured astrocytes were treated with various concentrations of ATP and Ca2+ imaging analysis was performed. (A) ΔF/F0 values over time following ATP application. (B) Area above curve values. Values in graphs are mean ± SEM (*p < 0.05, **p < 0.005, ***p < 0.0005, and ****p < 0.00005; one-way ANOVA).

      Regarding the comment on Ca2+ release rhythms under basal conditions, we interpret this as referring Ca2+ release in the absence of a stimulus. We typically observe Ca2+ release only upon stimulation, such as ATP treatment. However, we acknowledge that the modest effects of HERP knockdown on ITPR1/2 levels could question the HERP-ITPR axis’s role in ER Ca2+ release.

      To address this, we analyzed whether Herp KD induced increases in ER Ca2+ release were mediated through ITPRs by treating cells with Xestospongin C (XesC), an IP3R inhibitor. XesC treatment reduced ATP-induced ER Ca2+ release and eliminated the differences in ER Ca2+ release between control and Herp KD astrocytes (Fig. 3N – 3P). These results clearly indicate that HERP-ITPR axis plays critical role in controlling ER Ca2+ release. These new experiments have been included in Fig. 3 and explained in the result section (lines 217-221).

      Furthermore, following the reviewer’s suggestion, we examined whether HERP rhythms underlie the rhythms of ER Ca2+ response by analyzing ER Ca2+ response in Herp KD astrocyte in two different times following synchronization. In control astrocytes, ATP-induced ER Ca2+ responses vary depending on time, whereas these time-dependent variations were abolished in Herp KD astrocytes. These new experiments have been included in Fig. 4K – 4M and explained in the Results section (lines 232-235).

      Collectively, these results indicate that HERP rhythms lead to time-dependent differences in ER Ca2+ response through ITPRs.

      (2) Figure 4K-L. As data suggested the involvement of ITPR1 and ITPR2 (circadian effect), a reasonable next step is to determine their involvement, but the study did not pursue the hypothesis. 

      Thank you for your insightful comment. Our results indeed suggest that rhythms in ITPR2 levels may drive the time-dependent variations in ATP-induced ER Ca2+ release following synchronization. The newly conducted experiments demonstrated that treatment with the ITPR inhibitor XesC suppressed ATP-induced ER Ca2+ release at both control and Herp siRNA treatment conditions (Fig. 3). Based on these findings, we now further confirm that rhythms of ITPR levels, specifically ITPR2 underlie the circadian variations in ER Ca2+ release. While examining the effect of ITPR2 siRNA would directly prove the involvement of ITPR2, we have decided to pursue this experiment in the future studies.

      (3) Figure 5A-C. Data from WT cells should be included side by side with Bmal1-/- cells for comparison which is expected to be consistent with the HERP levels as in 5D-E. Again, the role of ITPR2 is suggested but not demonstrated. 

      Following the reviewer's suggestion, we conducted additional experiments including both WT and Bmal1-/- cultured astrocytes side-by-side. The results were consistent with our previous findings: WT astrocytes showed rhythms of ER Ca2+ release while Bmal1-/- astrocytes did not. We have updated the Figure 5A to 5C and the corresponding Results section in lines 242-245 accordingly.<br /> Regarding second comment, as mentioned in our previous response, we plan to examine the role of ITPR2 in further studies.

      (4) Figure 6. The Connexin data seems an addon and is correlative with the Ca2+ release. The role of Herp and Itpr in Connexin function is not addressed. Figure 6E-F was not called out in the results section. Suggest providing additional data to support the role of the HERP-ITPR axis in regulating Ca2+ release and Connexin activity. 

      We agree that additional data are needed to support the role of HERP in regulating CX43 phosphorylation. Therefore, we have conducted further experiments to determine whether rhythms of Cx43 phosphorylation are regulated by HERP. In the control astrocytes, ATP treatment induced time-dependent variations in Cx43 phosphorylation. However, these rhythms were abolished in Herp KD astrocytes. These results indicate that rhythms in HERP levels contribute to the time-dependent variations in Cx43 phosphorylation. These new experiments have included in Fig. 6G and 6H and explained in the results section (lines 276-281).

      Regarding second comment, we have corrected our oversight by properly referencing figures 6E-F in the results section. Please refer to lines 357-359 for clarification.

      (5) Discussion. This section should focus on noteworthy points to discuss, not repeating the results. 

      Based on the reviewer's valuable suggestions, we have revised the Discussion section to minimize repetition of the results. Thank you for your guidance.

      (6) The manuscript exhibits numerous grammatical and textual inaccuracies that necessitate careful revision by the authors. My observations here are confined to the title and the abstract alone. I recommend altering the title from "mouse cultured astrocytes" to "cultured mouse astrocytes" for clarity and grammatical correctness. The abstract, meanwhile, needs enhancements both in terms of its content and language. It should incorporate the results of the partitioning among the ER, cytoplasm, and mitochondria, and provide clear definitions for some of the critical terms used. It's worth noting that the abstract's second sentence contains a grammatical error. 

      Thank you for the reviewer’s valuable feedback. We have carefully revised the title, abstract, and main text to address the grammatical and textual issues. The title has been changed to “cultured mouse astrocytes”. Additionally, the abstract now includes results related to cytoplasmic Ca2+ dynamics and has been revised in several places. We appreciate your insights and have worked to enhance the content and language accordingly.

      Reference

      Agostinelli, F., Ceglia, N., Shahbaba, B., Sassone-Corsi, P., & Baldi, P. (2016). What time is it? Deep learning approaches for circadian rhythms. Bioinformatics, 32(12), i8-i17. https://doi.org/10.1093/bioinformatics/btw243

      Cahoy, J. D., Emery, B., Kaushal, A., Foo, L. C., Zamanian, J. L., Christopherson, K. S., Xing, Y., Lubischer, J. L., Krieg, P. A., Krupenko, S. A., Thompson, W. J., & Barres, B. A. (2008). A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J Neurosci, 28(1), 264-278. https://doi.org/10.1523/JNEUROSCI.4178-07.2008

      Carreras-Sureda, A., Pihán, P., & Hetz, C. (2018). Calcium signaling at the endoplasmic reticulum: fine-tuning stress responses. Cell Calcium, 70, 24-31. https://doi.org/10.1016/j.ceca.2017.08.004

      Enkvist, M. O., & McCarthy, K. D. (1992). Activation of protein kinase C blocks astroglial gap junction communication and inhibits the spread of calcium waves. J Neurochem, 59(2), 519-526. https://doi.org/10.1111/j.1471-4159.1992.tb09401.x

      Fujii, Y., Maekawa, S., & Morita, M. (2017). Astrocyte calcium waves propagate proximally by gap junction and distally by extracellular diffusion of ATP released from volume-regulated anion channels. Scientific Reports, 7(1), 13115. https://doi.org/10.1038/s41598-017-13243-0

      Giorgi, C., Marchi, S., & Pinton, P. (2018). The machineries, regulation and cellular functions of mitochondrial calcium. Nature Reviews Molecular Cell Biology, 19(11), 713-730. https://doi.org/10.1038/s41580-018-0052-8

      Glynn, E. F., Chen, J., & Mushegian, A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310-316. https://doi.org/10.1093/bioinformatics/bti789

      Hughes, M. E., Hogenesch, J. B., & Kornacker, K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythms, 25(5), 372-380. https://doi.org/10.1177/0748730410379711

      Ingiosi, A. M., Hayworth, C. R., Harvey, D. O., Singletary, K. G., Rempe, M. J., Wisor, J. P., & Frank, M. G. (2020). A Role for Astroglial Calcium in Mammalian Sleep and Sleep Regulation. Curr Biol, 30(22), 4373-4383.e4377. https://doi.org/10.1016/j.cub.2020.08.052

      Mei, W., Jiang, Z., Chen, Y., Chen, L., Sancar, A., & Jiang, Y. (2020). Genome-wide circadian rhythm detection methods: systematic evaluations and practical guidelines. Briefings in Bioinformatics, 22(3). https://doi.org/10.1093/bib/bbaa135

      Neary, J. T., van Breemen, C., Forster, E., Norenberg, L. O., & Norenberg, M. D. (1988). ATP stimulates calcium influx in primary astrocyte cultures. Biochem Biophys Res Commun, 157(3), 1410-1416. https://doi.org/10.1016/s0006-291x(88)81032-5

      Pankratov, Y., Lalo, U., Verkhratsky, A., & North, R. A. (2006). Vesicular release of ATP at central synapses. Pflugers Arch, 452(5), 589-597. https://doi.org/10.1007/s00424-006-0061-x

      Paredes, F., Parra, V., Torrealba, N., Navarro-Marquez, M., Gatica, D., Bravo-Sagua, R., Troncoso, R., Pennanen, C., Quiroga, C., Chiong, M., Caesar, C., Taylor, W. R., Molgó, J., San Martin, A., Jaimovich, E., & Lavandero, S. (2016). HERPUD1 protects against oxidative stress-induced apoptosis through downregulation of the inositol 1,4,5-trisphosphate receptor. Free Radic Biol Med, 90, 206-218. https://doi.org/10.1016/j.freeradbiomed.2015.11.024

      Szabó, Z., Héja, L., Szalay, G., Kékesi, O., Füredi, A., Szebényi, K., Dobolyi, Á., Orbán, T. I., Kolacsek, O., Tompa, T., Miskolczy, Z., Biczók, L., Rózsa, B., Sarkadi, B., & Kardos, J. (2017). Extensive astrocyte synchronization advances neuronal coupling in slow wave activity in vivo. Scientific Reports, 7(1), 6018. https://doi.org/10.1038/s41598-017-06073-7

      Tan, Z., Liu, Y., Xi, W., Lou, H. F., Zhu, L., Guo, Z., Mei, L., & Duan, S. (2017). Glia-derived ATP inversely regulates excitability of pyramidal and CCK-positive neurons. Nat Commun, 8, 13772. https://doi.org/10.1038/ncomms13772

      Torrealba, N., Navarro-Marquez, M., Garrido, V., Pedrozo, Z., Romero, D., Eura, Y., Villalobos, E., Roa, J. C., Chiong, M., Kokame, K., & Lavandero, S. (2017). Herpud1 negatively regulates pathological cardiac hypertrophy by inducing IP3 receptor degradation. Sci Rep, 7(1), 13402. https://doi.org/10.1038/s41598-017-13797-z

      Tsunematsu, T., Sakata, S., Sanagi, T., Tanaka, K. F., & Matsui, K. (2021). Region-specific and state-dependent astrocyte Ca<sup>2+</sup> dynamics during the sleep-wake cycle in mice. The Journal of Neuroscience, JN-RM-2912-2920. https://doi.org/10.1523/jneurosci.2912-20.2021

      Verkhratsky, A., & Nedergaard, M. (2018). Physiology of Astroglia. Physiol Rev, 98(1), 239-389. https://doi.org/10.1152/physrev.00042.2016

      Vyazovskiy, V. V., Olcese, U., Lazimy, Y. M., Faraguna, U., Esser, S. K., Williams, J. C., Cirelli, C., & Tononi, G. (2009). Cortical firing and sleep homeostasis. Neuron, 63(6), 865-878. https://doi.org/10.1016/j.neuron.2009.08.024

      Wu, G., Anafi, R. C., Hughes, M. E., Kornacker, K., & Hogenesch, J. B. (2016). MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics, 32(21), 3351-3353. https://doi.org/10.1093/bioinformatics/btw405

      Yang, R., & Su, Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168-174. https://doi.org/10.1093/bioinformatics/btq189

      Yoo, S. H., Ko, C. H., Lowrey, P. L., Buhr, E. D., Song, E. J., Chang, S., Yoo, O. J., Yamazaki, S., Lee, C., & Takahashi, J. S. (2005). A noncanonical E-box enhancer drives mouse Period2 circadian oscillations in vivo. Proc Natl Acad Sci U S A, 102(7), 2608-2613. https://doi.org/10.1073/pnas.0409763102

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      …several previous studies have identified co-expression of vomeronasal receptors by vomeronasal sensory neurons, and the expression of non-vomeronasal receptors, and this was not adequately addressed in the manuscript as presented.

      We’ve added context and citations to the Introduction and Results sections relating to recent studies on the co-expression of vomeronasal receptors and the expression of non-vomeronasal receptors in VSNs.

      The data resulting from the use of the Resolve Biosciences spatial transcriptomics platform are somewhat difficult to interpret, and the methods are somewhat opaque.

      The Molecular Cartography platform relies on multi-plex imaging of fluorescent probes that bind specifically to individual gene transcripts to determine their spatial location. Unfortunately, the detailed protocols remain proprietary at Resolve Biosciences and were not disclosed. We have clarified this in the revised manuscript. Our role in the acquisition and processing of data for this experiment is included in the current Methods section. Additional analysis produced from the Molecular Cartography data have been added (See response to Reviewer #2, below) to the supplemental materials to help clarify interpretation of the results.

      Reviewer #2:

      …the authors present a biased report of previously published work, largely including only those results that do not overlap with their own findings, but ignoring results that would question the novelty of the data presented here.

      We had no intention of misleading the readers. In fact, we have discussed discrepancies between our results with other studies. However, we inadvertently left out a critical publication in preparing the manuscript. We have added context and citations relating to recent studies that use single cell RNA sequencing in the vomeronasal organ, studies relating to the co-expression of vomeronasal receptors, and studies discussing V1R/V2R lineage determination. In Discussion, we also compared our model with a previous one of genetic determination of VNO neuronal fate.

      Did the authors perform any cell selectivity, or any directed dissection, to obtain mainly neuronal cells? Previous studies reported a greater proportion of non-neuronal cells. For example, while Katreddi and co-workers (ref 89) found that the most populated clusters are identified as basal cells, macrophages, pericytes, and vascular smooth muscle, Hills Jr. et al. in this work did not report such types of cells. Did the authors check for the expression of marker genes listed in Ref 89 for such cell types?

      For VNO dissections, we removed bones and blood vessels from VNO tissue and only kept the sensory epithelium. This procedure removed vascular smooth muscle cells, pericytes, and other non-neuronal cell types, which explains differences in cell proportions between our study and previous studies. We used a DAPI/Draq5 assay to sort live/nucleated cells for sequencing and no specific markers were used for cell selection. All cells in the experiment were successfully annotated using the cell-type markers shown in Fig. 1B, save for cells from the sVSN cluster, which were novel, and required further analysis to characterize.

      The authors should report the marker genes used for cell annotation.

      Marker genes used for cell annotation are shown in figure 1B. A full list of all marker genes used in the cell annotation process has been added to the Methods section.

      The authors reported no differences between juvenile and adult samples, and between male and female samples. It is not clear how they evaluate statistically significant differences, which statistical test was used, or what parameters were evaluated.

      The claims made about male/female mice and P14/P56 mice directly pertain to the distribution of clusters and cells in UMAP space as seen in Figure 1 C & D. We have performed differential gene expression analysis for male/female and P14/P56 comparisons using the FindMarkers function from the Seurat R package. Although we have found significant differential expression between male and female, and between P14 and P56 animals, the genes in this list do not appear to be influential for the neuronal lineage and cell type specification or related to cell adhesion molecules, which are the main focuses of this study. Nevertheless, we have added these results to the supplemental materials.

      ‘Based on our transcriptomic analysis, we conclude that neurogenic activity is restricted to the marginal zone.’ This conclusion is quite a strong statement, given that this study was not directed to carefully study neurogenesis distribution, and when neurogenesis in the basal zone has been proposed by other works, as stated by the authors.

      We have used fourteen slides from whole VNO sections in our Molecular Cartography analysis to quantify the number of GBCs, INPs, and iVSNs predicted in the marginal zone, the intermediate zone, and main/medial zone. We have performed a Wilcoxon signed-rank test to check for the significant presence of GBCs, INPs, and iVSNs in the marginal zone over their presence in the main/medial zone. The results are included in new Figure S3. The result from this analysis justifies our claim that neurogenesis is restricted to the MZ. This claim is also supported by the 2021 study by Katreddi & Forni.

      The authors report at least two new types of sensory neurons in the mouse VNO, a finding of huge importance that could have a substantial impact on the field of sensory physiology. However, the evidence for such new cell types is based solely on this transcriptomic dataset and, as such, is quite weak, since many crucial morphological and physiological aspects would be missing to clearly identify them as novel cell types. As stated before, many control and confirmatory experiments, and a careful evaluation of the results presented in this work must be performed to confirm such a novel and interesting discovery. The reported "novel classes of sensory neurons" in this work could represent previously undescribed types of sensory neurons, but also previously reported cells (see below) or simply possible single-cell sequencing artefacts.

      The reviewer is correct that detailed morphological and physiological studies are needed to further understand these cells. This is an opinion we share. Our paper is primarily intended as a resource paper to provide access to a large-scale single-cell RNA-sequenced dataset and discoveries based on the transcriptomic data that can support and inspire ongoing and future experiments in the field. Nonetheless, we are confident that neither of the novel cell clusters are the result of sequencing artefacts. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are physically connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN  cell clusters each show distinct and self-consistent expressions of genes (new Figure S4H). Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. We have performed  pseudotime analysis of sVSNs, differential gene expression and gene ontology analysis of mOSNs. The results are shown in the new Figure S6.

      The authors report the co-expression of V2R and Gnai2 transcripts based on sequencing data. That could dramatically change classical classifications of basal and apical VSNs. However, did the authors find support for this co-expression in spatial molecular imaging experiments?

      Genes with extremely high expression levels overwhelm signals from other genes, and therefore had to be removed from the experiment. This is a limitation of the Molecular Cartography platform. Unfortunately, Gnai2 was determined to be one of these genes and was not evaluated for this purpose.

      Canonical OSNs: The authors report a cluster of cells expressing neuronal markers and ORs and call them canonical OSN. However, VSNs expressing ORs have already been reported in a detailed study showing their morphology and location inside the sensory epithelium (References 82, 83). Such cells are not canonical OSNs since they do not show ciliary processes, they express TRPC2 channels and do not express Golf. Are the "canonical OSNs" reported in this study and the OR-expressing VSNs (ref 82, 83) different? Which parameters, other than Gnal and Cnga2 expression, support the authors' bold claim that these are "canonical OSNs"? What is the morphology of these neurons? In addition, the mapping of these "canonical OSNs" shown in Figure 2D paints a picture of the negligible expression/role of these cells (see their prediction confidence).

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. After performing differential gene expression on the putative mOSN cluster, comparing with V1R and V2R VSNs, independently, GO analysis returned the top significantly enriched GO cellular component, ‘cilium’. This new piece of data is presented in the updated Figure S6. Because we were limited to list of 100 genes in Molecular Cartography probe panel, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs.

      Secretory VSN: The authors report another novel type of sensory neurons in the VNO and call them "secretory VSNs". Here, the authors performed an analysis of differentially expressed genes for neuronal cells (dataset 2) and found several differentially expressed genes in the sVSN cluster. However, it would be interesting to perform a gene expression analysis using the whole dataset including neuronal and non-neuronal cells. Could the authors find any marker gene that unequivocally identifies this new cell type?

      We did not find unequivocal marker genes for sVSNs. We did perform differential analysis of the sVSN cluster with whole VNO data and with the neuronal subset, as well as against specific cell-types. We could not find a single gene that was perfectly exclusive to sVSNs. We used a combinatorial marker-gene approach to predicting sVSNs in the Molecular Cartography data. This required a larger subset of our 100 gene panel to be dedicated to genes for detecting sVSNs.

      When the authors evaluated the distribution of sVSN using the Molecular Cartography technique, they found expression of sVSN in both sensory and non-sensory epithelia. How do the authors explain such unexpected expression of sensory neurons in the non-sensory epithelium?

      In our scRNA-Seq experiment, blood vessels were removed, limiting the power to distinguish between certain cell types. Because of the limited number of genes that we can probe using Molecular Cartography, the number of genes associated with sVSNs may be present in the non-sensory epithelium. This could lead to the identification of cells that may or may not be identical to the sVSNs in the non-neuronal epithelium. Indeed, further studies will need to be conducted to determine the specificity of these cells.

      The low total genes count and low total reads count, combined with an "expression of marker genes for several cell types" could indicate low-quality beads (contamination) that were not excluded with the initial parameter setting. It looks like cells in this cluster express a bit of everything V1R, V2R, OR, secretory proteins.

      We are confident that the putative sVSN cell cluster is not the result of low-quality cells. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN cell clusters each show distinct and self-consistent expressions of genes (Fig. S1H). Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Moreover, while some genes were expressed at a lower level when compared to the canonical VSNs, others were expressed at higher levels, precluding the cause of discrepancy as resulting from an overall loss of gene counts.

      The authors wrote ‘...the transcriptomic landscape that specifies the lineages is not known...’. This statement is not completely true, or at least misleading. There are still many undiscovered aspects of the transcriptomics landscape and lineage determination in VSNs. However, authors cannot ignore previously reported data showing the landscape of neuronal lineages in VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). Expression of most of the transcription factors reported by this study (Ascl1, Sox2, Neurog1, Neurod1...) were already reported, and for some of them, their role was investigated, during early developmental stages of VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). In summary, the authors should fully include the findings from previous works (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259), clearly state what has been already reported, what is contradictory and what is new when compared with the results from this work.

      This is a difference in opinion about the terminology. Transcriptomic landscape in our paper refers to the genome-wide expression by individual cells, not just individual genes. The reviewer is correct that many of the genetic specifiers have been identified, which we cited and discussed. We consider these studies as providing a “genetic” underpinning, rather than the “transcriptomic landscape” in lineage progression. To avoid confusion, we have revised the statement to “… the transcriptional program that specifies the lineages is not known.” 

      …the co-expression of specific V2Rs with specific transcription factors does not imply a direct implication in receptor selection. Directed experiments to evaluate the VR expression dependent on a specific transcription factor must be performed.

      The reviewer is correct, and we did not claim that the co-expression of specific transcription factors indicates a direct relationship with receptor selection. We agree that further directed experiments are required to investigate this question.

      This study reports that transcription factors, such as Pou2f1, Atf5, Egr1, or c-Fos could be associated with receptor choice in VSNs. However, no further evidence is shown to support this interaction. Based on these purely correlative data, it is rather bold to propose cascade model(s) of lineage consolidation.

      The reviewer is correct. As any transcriptomic study will only be correlative, additional studies will be needed to unequivocally determine the mechanistic link between the transcription factors with receptor choice. Our model provides a basis for these studies.

      The authors use spatial molecular imaging to evaluate the co-expression of many chemosensory receptors in single VNO cells. […] However, it is difficult to evaluate and interpret the results due to the lack of cell borders in spatial molecular imaging. The inclusion of cell border delimitation in the reported images (membrane-stained or computer-based) could be tremendously beneficial for the interpretation of the results.

      The most common practice for cell segmentation of spatial transcriptomics data is to determine cell borders based on nuclear staining with expansion. We have tested multiple algorithms based on recent studies, but each has its own caveat.

      It is surprising that the authors reported a new cell type expressing OR, however, they did not report the expression of ORs in Molecular Cartography technique. Did the authors evaluate the expression of OR using the cartography technique?

      We were limited to a 100-gene probe panel and only included one OR. The expression was not high enough for us to substantiate any claims.

      Reviewer #3:

      (1) The authors claim that they have identified two new classes of sensory neurons, one being a class of canonical olfactory sensory neurons (OSNs) within the VNO. This classification as canonical OSNs is based on expression data of neurons lacking the V1R or V2R markers but instead expressing ORs and signal transduction molecules, such as Gnal and Cnga2. Since OR-expressing neurons in the VNO have been previously described in many studies, it remains unclear to me why these OR-expressing cells are considered here a "new class of OSNs." Moreover, morphological features, including the presence of cilia, and functional data demonstrating the recognition of chemosignals by these neurons, are still lacking to classify these cells as OSNs akin to those present in the MOE. While these cells do express canonical markers of OSNs, they also appear to express other VSN-typical markers, such as Gnao1 and Gnai2 (Figure 2B), which are less commonly expressed by OSNs in the MOE. Therefore, it would be more precise to characterize this population as atypical VSNs that express ORs, rather than canonical OSNs.

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. We have performed differential gene expression analysis on the putative mOSN cluster to compare with V1R and V2R VSNs. GO analysis returned the top significantly enriched GO terms, including many related to “cilium”., further supporting that these are OSNs. Because we were limited to list of 100 genes in Molecular Cartography probe panels, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs. With regard to Gnai2 and Go expression, we have examined our data from the OSNs dissociated from the olfactory epithelium and detected substantial expression of both. This new analysis provides additional support for our claim. We now present differentially expressed genes and GO term analysis of the mOSN class in the updated Figure S6.

      (2) The second new class of sensory neurons identified corresponds to a group of VSNs expressing prototypical VSN markers (including V1Rs, V2Rs, and ORs), but exhibiting lower ribosomal gene expression. Clustering analysis reveals that this cell group is relatively isolated from V1R- and V2R-expressing clusters, particularly those comprising immature VSNs. The question then arises: where do these cells originate? Considering their fewer overall genes and lower total counts compared to mature VSNs, I wonder if these cells might represent regular VSNs in a later developmental stage, i.e., senescent VSNs. While the secretory cell hypothesis is compelling and supported by solid data, it could also align with a late developmental stage scenario. Further data supporting or excluding these hypotheses would aid in understanding the nature of this new cell cluster, with a comparison between juvenile and adult subjects appearing particularly relevant in this context.

      We wholeheartedly agree with this assessment. Our initial thought was that these were senescent VSNs, but the trajectory analysis did not support this scenario, leading us to propose that these are putative secretive cells. Our analysis also shows that overall, 46% of the putative sVSNs were from the P14 sample and 54% from P56. These cells comprise roughly 6.4% of all P14 cells and 8.5% of P56 cells. In comparison, 28.4% of all cells are mature V1R VSNs at P14, but the percentage rise to 46.7% at P56. The significant presence of sVSNs at P14, and the disproportionate increase when compared with mature VSNs indicate that these are unlikely to be late developmental stage or senescent cells, although we cannot exclude these possibilities.

      We have included the sVSNs in a trajectory inference analysis and found that the pseudotime values of the sVSNs are within the range of those cells within the V1R and V2R lineages, indicating a similar maturity (Fig. S6).

      (3) The authors' decision not to segregate the samples according to sex is understandable, especially considering previous bulk transcriptomic and functional studies supporting this approach. However, many of the highly expressed VR genes identified have been implicated in detecting sex-specific pheromones and triggering dimorphic behavior. It would be intriguing to investigate whether this lack of sex differences in VR expression persists at the single-cell level. Regardless of the outcome, understanding the presence or absence of major dimorphic changes would hold broad interest in the chemosensory field, offering insights into the regulation of dimorphic pheromone-induced behavior. Additionally, it could provide further support for proposed mechanisms of VR receptor choice in VSNs. 

      The reviewer raised a good point. We did not observe differences between male and female, or between P14 and P56 mice in the distribution of clusters and cells in UMAP space. Indeed, our differential expression analysis has revealed significantly differentially expressed genes in both comparisons. Results from these analyses are presented in the new Figures S1 and S2.   

      (4) The expression analysis of VRs and ORs seems to have been restricted to the cell clusters associated with the neuronal lineage. Are VRs/ORs expressed in other cell types, i.e. sustentacular, HBC, or other cells?

      Sparsely expressed low counts of VR and OR genes were observed in non-neuronal cell-types. When their expression as a percentage of cell-level gene counts is considered, however, the expression is negligible when compared to the neurons. The observed expression may be explained by stochastic base-level expression, or it may be the result of remnant ambient RNA that passed filtering.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review): 

      Summary: 

      The fungal cell wall is a very important structure for the physiology of a fungus but also for the interaction of pathogenic fungi with the host. Although a lot of knowledge on the fungal cell wall has been gained, there is a lack of understanding of the meaning of ß-1,6-glucan in the cell wall. In the current manuscript, the authors studied in particular this carbohydrate in the important humanpathogenic fungus Candida albicans. The authors provide a comprehensive characterization of cell wall constituents under different environmental and physiological conditions, in particular of ß-1,6glucan. Also, β-1,6-glucan biosynthesis was found to be likely a compensatory reaction when mannan elongation was defective. The absence of β-1,6-glucan resulted in a significantly sick growth phenotype and complete cell wall reorganization. The manuscript contains a detailed analysis of the genetic and biochemical basis of ß-1,6-glucan biosynthesis which is apparently in many aspects similar to yeast. Finally, the authors provide some initial studies on the immune modulatory effects of ß-1,6-glucan. 

      Strengths: 

      The findings are very well documented, and the data are clear and obtained by sophisticated biochemical methods. It is impressive that the authors successfully optimized methods for the analyses and quantification of ß-1-6-glucan under different environmental conditions and in different mutant strains. 

      Weaknesses: 

      However, although already very interesting, at this stage there are some loose ends that need to be combined to strengthen the manuscript. For example, the immunological studies are rather preliminary and need at least some substantiation. Also, at this stage, the manuscript in some places remains a bit too descriptive and needs the elucidation of potential causalities.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors provide the first (to my knowledge) detailed characterization of cell wall b-1,6 glucan in the pathogen Candida albicans. The approaches range from biochemistry to genetics to immunology. The study provides fundamental information and will be a resource of exceptional value to the field going forward. Highlights include the construction of a mutant that lacks all b-1,6 glucan and the characterization of its cell wall composition and structure. Figure 5a is a feast for the eyes, showing that b-1,6 glucan is vital for the outer fibrillar layer of the cell wall. Also much appreciated was the summary figure, Figure 7, which presents the main findings in digestible form.

      Strengths: 

      The work is highly significant for the fungal pathogen field especially, and more broadly for anyone studying fungi, antifungal drugs, or antifungal immune responses.

      The manuscript is very readable, which is important because most readers will be cell wall nonspecialists.

      The authors construct a key quadruple mutant, which is not trivial even with CRISPR methods, and validate it with a complemented strain. This aspect of the study sets the bar high. The authors develop new and transferable methods for b-1,6 glucan analysis. 

      Weaknesses: 

      The one "famous" cell type that would have been interesting to include is the opaque cell. This could be included in a future paper.

      Reviewer #3 (Public Review): 

      Summary: 

      The cell wall of human fungal pathogens, such as Candida albicans, is crucial for structural support and modulating the host immune response. Although extensively studied in yeasts and molds, the structural composition has largely focused on the structural glucan b,1,3-glucan and the surface exposed mannans, while the fibrillar component β-1,6-glucan, a significant component of the well wall, has been largely overlooked. This comprehensive biochemical and immunological study by a highly experienced cell wall group provides a strong case for the importance of β-1,6-glucan contributing critically to cell wall integrity, filamentous growth, and cell wall stability resulting from defects in mannan elongation. Additionally, β-1,6-glucan responds to environmental stimuli and stresses, playing a key role in wall remodeling and immune response modulation, making it a potential critical factor for host-pathogen interactions.

      Strengths: 

      Overall, this study is well-designed and executed. It provides the first comprehensive assessment of β-1,6-glucan as a dynamic, albeit underappreciated, molecule. The role of β-1,6-glucan genetics and biochemistry has been explored in molds like Aspergillus fumigatus, but this work shines an important light on its role in Candida albicans. This is important work that is of value to Medical Mycology, since β-1,6-glucan plays more than just a structural role in the wall. It may serve as a PAMP and a potential modulator of host-pathogen interactions. In keeping with this important role, the manuscript rigor would benefit from a more physiological evaluation ex vivo and preferably in vivo, assessment on stimulating the immune system within in the cell wall and not just as a purified component. This is a critical outcome measure for this study and gets squarely at its importance for host-pathogen interactions, especially in response to environmental stimuli and drug exposure.

      Response to reviewers (Public reviews):

      We thank all the three reviewers for their opinion on our work on Candida albicans β-1,6-glucan, which highlights the importance of this cell wall component in the biology of fungi. Here are our responses to their comments for public reviews:

      (1) Indeed, the data presented for immunological studies is preliminary. It has been acknowledged by the reviewers that our analysis providing insights into the biosynthetic pathways involved in comprehensive in dealing with organization and dynamics of the β-1,6-glucan polymer in relation with other cell wall components and environmental conditions (temperature, stress, nutrient availability, etc.). However, we anticipated that there would be immediate curiosity as to what the immunological contribution of β-1,6 glucan and we therefore felt we needed to initiative these studies and include them. We therefore performed immunological studies to assess whether β-1,6-glucans act as a pathogen-associated molecular pattern (PAMP), and if so, what its immunostimulatory potential is. Our data clearly suggest that β-1,6-glucan is a PAMP, and consequently lead to several questions: (a) what are the host immune receptors involved in the recognition of this polysaccharide, and thereby the downstream signaling pathways, (b) how is β-1,6-glucan differentially recognized by the host when C. albicans switches from a commensal to an opportunistic pathogen, and (c) how does the host environment impact the exposure of this polysaccharide on the fungal surface. We believe addressing these questions is beyond the scope of the present manuscript and aim to present new data in future manuscript. Nonetheless, in the revised manuscript, suggest approaches that we can take to identify the receptor that could be involved in the recognition of β-1,6-glucan. Moreover, we have modified the discussion presenting it based on the data rather than being descriptive.  

      (2) It will be interesting to assess the organization of β-1,6-glucan and other cell wall components in the opaque cells. It is documented that the opaque cells are induced at acidic pH and in the presence of N-acetylglucosamine and CO2. Our data shows that pH has an impact on β-1,6-glucan, which suggests that there will be differential organization of this polysaccharide in the cell wall of opaque cells. As suggested by the reviewer, we will include analysis of opaque cells (and other C. albicans cell types) in future studies. 

      With the exception of these major new avenues for this research, our revision can address each of the comments provided by the reviewers.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Although the study is very interesting, there are some loose ends that need to be combined to strengthen the manuscript. For example, the immunological studies are rather preliminary and need at least some substantiation. Also, at this stage, the manuscript in some places remains a bit too descriptive and needs the elucidation of potential causalities.

      Specifically: 

      (1) As you showed, defects in chitin content led to a decrease in the cross-linking of β-glucans in the inner wall that corresponded to the effect of nikkomycin-treated C. albicans phenotype; conversely, an increase in chitin content led to more cross-linking of β-glucans as observed in the FKS1 mutant or in the presence of caspofungin. What is the mechanistic reason for these observations? 

      On one hand, yeast cell wall chitin occurs in three forms: free and covalently linked to β-1,3-glucan or β-1,6-glucan; crosslinked β-glucan-chitin forms core fibrillar structure resistant to alkali. A decrease in the chitin content, therefore, affect β-glucan-chitin crosslinking thereby making β-glucan alkali-soluble. On the other hand, a decrease in the β-glucan content, as in FKS1 mutant or upon caspofungin treatment, results in increased cell wall chitin and β-glucan-chitin contents. A decrease in the β-1,3-glucan biosynthesis is associated with upregulation of CRH1 involved in the β-glucan-chitin crosslinking, which explains an increased β-glucan-chitin content in the FKS1 mutant or upon caspofungin treatment. We have included in this discussion in the revised manuscript (p14, lines 2-10).     

      (2) The β-1,6-glucan biosynthesis is stimulated via a compensatory pathway when there is a defect in O- and N-linked cell wall mannan biosynthesis. Why? causality? Hypothesis?  

      Two phenomena were observed related to β-1,6-glucan and mannan biosynthesis: 1) a defect in the elongation of N-mannan led to an increase in the β-1,6-glucan content; 2) a defect of O-mannan elongation resulted in the reduce size of β-1,6-glucan chains, however, increased their branching. These observations of our study suggest a global rescue program of the cell wall damage that could occur due to defect in one of the cell wall contents. We have discussed this in the revised manuscript (p14, last paragraph, p15 first paragraph). Moreover, β-1,3-glucan and chitin are synthesized by respective membrane bound synthases, and a defect in of their synthesis is compensated by the other. In line, although need to be validated for β-1,6-glucan, biosynthesis of mannan and β-1,6-glucan seem to initiate intracellularly. Therefore, possibility is that the defective mannan biosynthesis could be compensated by β-1,6-glucan biosynthesis, but need to be further validated experimentally. 

      (3) You showed that the removal of β-1,6-glucan by periodate oxidation (AI-OxP) led to a significant decrease in the IL-8, IL-6, IL-1β, TNF-α, C5a, and IL-10 released, suggesting that their stimulation was in part β-1,6-glucan dependent. What is the consequence of the stimulation, e.g. better phagocytosis, etc.? This needs some more experiments, otherwise the data is purely descriptive, as the conclusion. Also, what do you want to show with the activation of the complement system? Is ß1,6-glucan detected by complement receptors? I think this is really a loose end. I think it is necessary to provide more data on this observation, which I think lacks control with serum lacking complement, this should then be moved to the main manuscript. 

      In this study, our aim was to assess whether β-1,6-glucan acts as a pathogen-associated molecular pattern (PAMP) of C. albicans, and if yes, what is its immunostimulatory capacity/potential. Our data confirms that, indeed, β-1,6-glucan acts as a PAMP, and its removal significantly reduces the immunostimulatory capacity of the fibrillar core structure of the C. albicans cell wall. On the other hand, data provided in the revised manuscript (see updated Figure S14, discussion p13 lines 16-21) indicate that the human serum factors significantly enhance the immunostimulatory capacity of β1,6-glucan and that β-1,6-glucan interacts with the complement component C3b. However, addressing the role of β-1,6-glucan in phagocytosis using β-1,6-glucan deletion mutant will not be possible as the cell wall of this mutant is modified, and β-1,6-glucan is not the only cell wall component interacting with C3b. Alternate is to coat β-1,6-glucan on beads and use to study phagocytosis and identify immune receptors; however, these are beyond the scope of our present study/focus.      

      (4) Also, you suggested that β-1,6-glucan and β-1,3-glucan stimulate innate immune cells in distinct ways. Please provide more data on this interesting suggestion. You can block the dectin-1 receptor for example or use dectin-1 deficient macrophages from mice. The part on the immune stimulation needs to be optimized. 

      Stimulation of immune cells by pustulan (insoluble linear β-1,6-glucan) via a dectin-1independent pathway has been described previously (PMIDs: 18005717, 16371356) as discussed in the manuscript. Our preliminary data indicate that dectin-1 blocking on immune cells (using antidectin-1 antibodies) has no effect on the immunostimulatory potential of β-1,6-glucan, unlike AI and AI-OxP that showed significantly reduced cytokine secretion by the immune cells upon dectin-1 blocking. Deciphering the β-1,6-glucan recognition and its immunomodulatory pathways are underway, and will be the subject of our future study/manuscript.   

      (5) β-1,6-glucan and mannan productions are coupled. What is the hypothesis? Is it due to the necessity of mannan residues in ß-1,6-glucan biosynthesis enzymes from the ER? Can that be experimentally proven? 

      β-1,6-glucan and mannan synthesis should be coupled in two ways. First, as mentioned above (Response 2), defects in mannan elongation led to an alteration of β-1,6-glucan production. Second, early steps of N-glycosylation led to a strong reduction of β-1,6-glucan size and its cell wall content. However, we do not believe that the synthesis of N-glycan is required for the synthesis of an acceptor essential to β-1,6-glucan synthesis. Defect in N-mannan elongation led to a global cell wall remodeling as described above. Kre5, Rot2 and Cwh41 are part of the calnexin cycle involved in the control of N-glycoprotein folding in the ER, suggesting that some protein directly involved in the β-1,6-glucan synthesis required a folding quality control to be active. We modified our discussion, accordingly, highlighting these points (p14, last paragraph, p15 second paragraph).

      (6) As PHR1 and PHR2 genes are strongly regulated by external pH, the compensatory differences described may be explained by pH-dependent regulation of β-1,6-glucan synthesis.' Please check. Also, could the pH regulation form the basis of e.g. differences you found for ß-1,6-glucan under different environmental conditions, i.e., growth on different carbon sources leads to different external pH values, as shown for many fungi?  

      We agree that environmental pH is dependent on carbon source and pH varies during growth curve. To test the effect of pH we buffered the medium with 100 mM MOPS or MES. Clearly, Fig. 2 and S1 show that the pH has an effect on the cell wall composition and polymer exposure as previously described (PMID: 28542528). Here, we show that pH has an impact on the β-1,6-glucan size as well as its branching. However, in buffered medium, addition of organic acid (such as acetate, propionate, butyrate or lactate) had an impact on cell wall composition, showing that not only pH has an effect on cell wall composition. About _phr1_Δ/Δ and _phr2_Δ/Δ mutants, we believe that the difference in the cell wall composition observed between mutants is mainly due to the pH-dependent regulation, which we indicated in the discussion (p14, end of first paragraph).

      Minor: 

      (1) In Figure 7B: dynamism should be replaced by dynamic and in term is rather in terms.  

      Modified as suggested.

      (2) Replace molecular size with molecular mass when you give daltons. 

      Molecular size has been replaced by molecular weight, when presented as daltons.

      (3) Page 7: for explanation, please add that nikkomycin is a chitin biosynthesis inhibitor.   

      As suggested, explained that nikkomycin is a chitin biosynthesis inhibitor.

      Reviewer #2 (Recommendations For The Authors):

      (1) I wondered if the increased chitin content of hyphae might reflect growth on the precursor GlcNAc. Have you tested hyphae that are induced in other ways? (2) Related to point 1, did you look at the relative abundance of yeast vs hyphae in the preparation? I wonder if yeast contamination might have reduced the extent of the composition changes observed. 

      We used GlcNAc as hyphae inducer as: 1) in presence of GlcNAc, hyphae are produced without any yeast contamination; in this condition, we observed an increase in the chitin content, as described, in hyphae (PMID: 16423067); 2) we excluded using of serum, another condition inducing hyphal formation, as we could not control serum factors that may impact cell wall composition. We now indicate in the methods section that hyphae induced by GlcNAc were not contaminated by yeast (p17, line 3). 

      (3) I recommend rephrasing the first sentence of the Figure 2 legend: "Cells were grown in liquid SD medium at 37oC at exponential phase under different growth conditions." The conditions varied extensively - stationary is not exponential; biofilm is probably not exponential. Also, the "D" in "SD" stands for dextrose, and the carbon source varied a good deal. Perhaps you could say: "Cells were grown in liquid synthetic medium at 37oC under different growth conditions, as specified in Methods." 

      Sentences have been rephrased.  

      (4) Figure 7b has a typo: "dependant" for "dependent".

      Typo-error has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      To explore the biochemical composition of the cell wall, the authors fractionated the wall component into three categories based on polymer properties and reticulations: sodium-dodecyl-sulphate-βmercaptoethanol (SDS-β-ME) extract, alkali-insoluble (AI), and alkali-soluble (AS) fractions, and they developed several independent methods to distinguish between β-1,3-glucans and β-1,6-glucans. The composition and surface exposure of fungal cell wall polymers is known to depend on environmental growth conditions. It was shown that the cell wall of C. albicans hyphae increased chitin content (10% vs. 3%) and decreased β-1,6-glucan (18% vs. 23%) and mannan (13% vs. 20%) compared to the yeast form, and the reduced β-1,6-glucan content was associated with a smaller β1,6-glucan size (43 vs. 58 kDa), suggesting that both the content and structure of β-1,6-glucan are regulated during growth and cellular morphogenesis. Similar behavior was observed when exposing cells to acid and neutral medium pH. The most significant cell wall alteration occurred in a lactatecontaining medium, which led to a sharp reduction in structural core polysaccharides: chitin (-43%), β-1,3-glucan (-48%), and β-1,6-glucan (-72%). This reduction aligns with the previously observed decreases in inner cell wall layer thickness. As expected, the authors found that modulating chitin content genetically (chs3Δ/Δ knockout mutant) led to an increase of both β-1,3-glucan and β-1,6glucan. An increase in chitin content following genetic alteration of FKS genes impacting glucan synthase or after exposure to the echinocandin caspofungin led to enhanced cross-linking of βglucans. A slight increase in the β-1,3-glucan branching was also observed in the mnt1/mnt2Δ/Δ double mutant, suggesting that β-1,6-glucan and mannan synthesis may be coupled.

      - This effect is not that pronounced, and the relationship appears somewhat overstated and may reflect an indirect interaction. The authors should address accordingly. 

      We agree that this sentence was overstated. To make it clearer and less pronounced, we divided this sentence into to two with less pronounced statements (p8, line 34).

      The genetics of β-1,6-glucan biosynthesis appear complex and a figure describing putative roles for specific genes would be beneficial. For example, KRE6 is a glucosyl hydrolase required for beta1,6-glucan biosynthesis.

      - It would be valuable to better understand the overall biosynthetic process. Please elaborate more in a figure. 

      Although proteins/enzymatic activities directly involved in the β-1,6-glucan biosynthesis have not yet been identified, as suggested by this reviewer, we included a schematic representation of this process based on our hypothesis (Figure S15, and p15 lines 17-22 in revised manuscript), indicating the possible involvement of Kre6p.  

      The deletion of KRE6 homologs, essential for β-1,6-glucan biosynthesis, resulted in the absence of β-1,6-glucan production, and significant structural alterations of the cell wall. This result nicely confirms the important role of β-1,6-glucan in regulating cell wall homeostasis. The absence of β1,6-glucan was associated with increased (mutant v. WT) chitin content (9.5% vs. 2.5%) and highly branched β- β-1,6-glucan 1,3-glucan (48% vs. 20%). TEM ultrastructure studies nicely showed the change in cell wall overall architecture. From a drug discovery perspective, since the blockade of β1,6-glucan did not block growth, it may have more value as a potential virulence target. This would be valuable but needs to be assessed in animal model challenge competition experiments.

      - The authors may want to elaborate more. 

      We agree and modified “antifungal target” as “potential virulence target”.

      It is well known that β-1,3-glucan, mannan, and chitin function serve as PAMPs, which induce immune responses. The role of β-1,6-glucan as a PAMP is not well understood, and the authors provide evidence that different cell wall extracted fractions with enriched constituents induce immune responses invoking cytokines, chemokines, and acute phase proteins, as well as the complement system. While this data clearly shows that β-1,6-glucan is immunologically active and potentially important for host-pathogen interactions, the analysis is preliminary and falls short of making this case. 

      - This is a critical point in getting at the potential host signaling of β-1,6-glucan contained in the cell wall or shed by the cell (is this known?)

      - This analysis would be bolstered significantly by examining stimulation relative to other cell wall components, and most importantly, whole cell modulation of β-1,6-glucan exposure for immune presentation, and not just unnatural concentrated extracts. This can be readily accomplished with the various mutants in hand, as well as after exposure to various antifungal agents echinocandins and nikkomycins) (see Hohl et al. 2008 JID). Additional validation would benefit from animal model studies to examine in vivo immune modulation.

      We agree with the reviewer. However, the main focus of our present work was to study the organization and dynamics of C. albicans cell wall β-1,6-glucan, and to explore its possible role as pathogen-associated molecular pattern (PAMP). Our study indicates that, indeed, β-1,6-glucan acts as a PAMP with immunostimulatory potential. As pointed by this reviewer, and similar to β-1,3glucans, the exposure of β-1,6-glucan is probably a key point in immune response. However, this investigation beyond the scope of this study, underway and will be presented in our future work.

      - The Discussion would also benefit from an analysis of how β-1,6-glucan in Aspergillus fumigatus, which was largely elucidated by the same primary authors. 

      To our knowledge, β-1,6-glucan has never been identified, either by chemical analysis (PMID: 10869365; PMID: 36836270) or solid-state NMR (PMID: 34732740), in the cell wall of A. fumigatus, although a homolog of KRE6 is present in A. fumigatus but with unknown function.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their detailed comments. Several comments revolved around potential improvements in the 3D reconstructions that are obtained in later steps of the image processing pipelines for single-particle cryoEM and cryo-electron tomography. We have not investigated how our improvements in CTFFIND5 affect these downstream results and can therefore not make specific and quantitative statements in this regard. However, CTFFIND5 provided additional information about the sample that users will find useful (thickness, tilt) for selecting the data they would like to include in later processing, and how to process them. Furthermore, when the sample tilt of a thin specimen is known, local defocus estimates (e.g., per-particle defocus estimates) will be more accurate compared to estimates that ignore tilt information. In the following, we provide point-by-point responses to the reviewers’ comments.

      Reviewer #1 (Public Review):

      This work presents CTFFIND5, a new version of the software for determination of the Contrast Transfer Function (CTF) that models the distortions introduced by the microscope in cryoEM images. CTFFIND5 can take acquisition geometry and sample thickness into consideration to improve CTF estimation.

      To estimate tilt (tilt angle and tilt axis), the input image is split into tiles and correlation coefficients are computed between their power spectra and a local CTF model that includes the defocus variation according to a tilted plane. As a final step, by applying a rescaling factor to the power spectra of the tiles, an average tilt-corrected power spectrum is obtained and used for diagnostic purposes and to estimate the goodness of fit. This global procedure and the rescaling factor resemble those used in Bsoft, Warp, etc, with determination of the tilt parameters being a feature specific of CTFFIND5 (and formerly CTFTILT). The performance of the algorithm is evaluated with tilted 2D crystals and tiltseries, demonstrating accurate tilt estimation in some cases and some limitations in others. Further analysis of CTF determination with tilt-series, particularly showing whether there is accurate or stable estimation at high tilts, might be helpful to show the robustness of CTFFIND5 in cryoET.

      CTFFIND5 represents the first CTF determination tool that considers the thickness-related modulation envelope of the CTF firstly described by McMullan et al. (2015) and experimentally confirmed by Tichelaar et al. (2020). To this end, CTFFIND5 uses a new CTF model that takes the sample thickness into account. CTFFIND5 thus provides more accurate CTF estimation and, furthermore, gives an estimation of the sample thickness, which may be a valuable resource to judge the potential for high resolution. To evaluate the accuracy of thickness estimation in CTFFIND5, the authors use the Lambert-Beer law on energy-filtered data and also tomographic data, thus demonstrating that the estimates are reasonable for images with exposure around 30 e/A2. While consideration of sample thickness in CTF determination sounds ideally suited for cryoET, practical application under the standard acquisition protocols in cryoET (exposure of 3-5 e/A2 per image) is still limited. In this regard, the authors are honest in the conclusions and clearly identify the areas where thickness-aware CTF determination will be valuable at present: e.g. in situ single particle analysis and in vitro single particle cryoEM of purified samples at low voltages.

      In conclusion, the manuscript introduces novel methods inside CTFFIND5 that improve CTF estimation, namely acquisition geometry and sample thickness. The evaluation demonstrates the performance of the new tool, with fairly accurate estimates of tilt axis, tilt angle and sample thickness and improved CTF estimation. The manuscript critically defines the current range of application of the new methods in cryoEM.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes the latest version of the most popular program for CTF estimation for cryo-EM images: CTFFIND5. New features in CTFFIND5 are the estimation of tilt geometry, including for samples, like FIB-milled lamellae, that are pre-tilted along a different axis than the tilt axis of the tomographic experiment, plus the estimation of sample thickness from the expanded CTF model described by McMullan et al (2015). The results convincingly show the added value of the program for thicker and tilted images, such as are common in modern cryo-ET experiments. The program will therefore have a considerable impact on the field.

      I have only minor suggestions for improvement below:

      Abstract: "[CTF estimation] has been one of the key aspects of the resolution revolution"-> This is a bit over the top. Not much changed in the actual algorithms for CTF estimation during the resolution revolution.

      We have removed this statement in the abstract.

      L34: "These parameters" -> Cs is typically given, only defocus (and if relevant phase shift) are estimated.

      We have modified the introduction to reflect this. Page 3, L30-35

      L110-116: The text is ambiguous: are rotations defined clockwise or counter-clockwise? It would be good to explicitly state what subsequent rotations, in which directions and around which axes this transformation matrix (and the input/output angles in CTFFIND5) correspond to.

      Thank you for pointing this out. We have revised the Methods section, Page 4 L57-61,  to explicitly define the convention for the tilt axis and tilt angle. We have also modified Fig. 1b to illustrate our convention for the tilt axis.

      L129-130: As a suggestion: it would be relatively easy, and possibly beneficial to the user, to implement a high-resolution limit that varies with the accumulated dose on the sample. One example of this exists in the tomography pipeline of RELION-5.

      We appreciate the suggestion. However, since CTFFIND5 currently has no concept of a tilt-series and treats every micrograph independently, this would not be trivial to implement. As detailed below, CTFFIND5 in its current form is not targeted toward tomography processing, but its features might be useful for its use in pipelines for tomography processing, such as RELION-5. We made this more explicit in the conclusion section. Page 16 L390-399

      Substituting Eq (7) into Eq (6) yields ksi=pi, which cannot be true. If t is the sample thickness, then how can this be a function of the frequency g of the first node of the CTF function? The former is a feature of the sample, the latter is a parameter of the optical system. This needs correction.

      We have rewritten the text describing equations 7 and 6 to avoid this confusion (Page 7, L146-153). The reviewer is right that inserting Eq. 7 into Eq. 6 yields ksi=psi, as in fact Eq. 7 is derived from Eq. 6, by substituting ksi=psi, since this describes the condition for the first node. Also, in this context, nodes in the CTF function refer to the places where the term sinc(ksi) becomes zero and therefore the CTF is apparently "flat". The frequency at which this occurs is sample-thickness dependent. As explained below, the previous version of our manuscript did not point out the difference between the first zero and first node in the power spectrum. We have amended Fig. 3a to make this difference clearer.

      Reviewer #3 (Public Review):

      In this manuscript, the authors detail improvements in the core CTFFIND (CTFFIND5 as implemented in cisTEM) algorithm that better estimates CTF parameters from titled micrographs and those that exhibit signal attenuation due to ice thickness. These improvements typically yield more accurate CTF values that better represent the data. Although some of the improvements result in slower calculations per micrograph, these can be easily overcome through parallelization.

      There are some concerns outlined below that would benefit from further evaluation by the authors.

      For the examples shown in Figure 3b, given the small differences in estimated defocus1 and 2, what type of improvements would be expected in the reconstructed tomograms? Do such improvements in estimates manifest in better tilt-series reconstruction?

      As explained in our preface, we do not believe that these difference would manifest in any improvements during tilt-series reconstruction and would not create any meaningful differences, even when tomograms are reconstructed with CTF correction. They might become meaningful during subtomogram averaging, but subtomograms are usually corrected using per-particle CTF estimation, similar to single-particle processing. We have included a new paragraph in the discussion to describe potential benefits of CTFFIND5 for cryo-tomography, Page 16 L390-399.

      Similarly, the data shown in Figure 3C shows minimal improvements in the CTF resolution estimate (e.g., 4.3 versus 4.2 Å), but exhibited several hundred Å difference in defocus values. How do such differences impact downstream processing? Is such a difference overcame by per-particle (local) CTF refinements (like the authors mention in the discussion, see below)?

      The difference in the defocus estimate (~600A) is substantially smaller than the thickness of the sample (2000A). Hence both estimates may be valid, depending on which particles inside the sample are considered. Particles with larger defocus errors could certainly be corrected by per-particle CTF refinement as long as the search range is chosen to be large enough. The main benefit of using CTFFIND5 is information for the user regarding the sample thickness to set the defocus search range appropriately.

      At which point does the thickness of the specimen preclude the ice thickness modulation to be included for "accurate" estimate? 500Å? 1000Å? 2000Å? Based on the data shown in Figure 3B, as high as 969 Å thick specimens benefit moderately (4.6 versus 3.4 Å fit estimate), but perhaps not significantly, from the ice thickness estimation. Considering the increased computational time for ice thickness estimation, such an estimate of when to incorporate for single-particle workflows would be beneficial.

      As explained in our preface, the main benefit for single-particle workflows will be sample tilt estimation. This will provide more accurate per-particle defocus estimates, compared to estimates that do not take the tilt into account. For single-particle samples, the ice thickness in holes is probably more efficiently monitored using the Beer-Lambert law.

      It would seem that this statement could be evaluated herein: "the analysis of images of purified samples recorded at lower acceleration voltages, e.g., 100 keV (McMullan et al., 2023), may also benefit since thickness-dependent CTF modulations will appear at lower resolution with longer electron wavelengths". There are numerous examples of 300kV, 200kV, and 100kV EMPIAR datasets to be compared and recommendations would be welcomed.

      Publicly available datasets recorded at 100kV and 200kV were collected in very thin ice, making it difficult to demonstrate the stated benefits. We have removed this statement.

      Although logical, this statement is not supported by the data presented in this manuscript: "The improvements of CTFFIND5 will provide better starting values for this refinement, yielding better overall CTF estimation and recovery of high-resolution information during 3D reconstruction."

      We have revised this statement and now explain that the sample tilt information will provide more accurate per-particle defocus estimates, compared to estimates that do not take the tilt into account, Page 17, L400-409. We did not investigate how this will affect downstream processing results.

      Moreso, the lack of single-particle data evaluation does present a concern. Naively, these improvements would benefit all cryoEM data, regardless of modality.

      We agree with the reviewer that all cryoEM modalities should benefit from more accurate defocus value estimates and have amended our concluding statement. However, how improved defocus values will benefit downstream processing results will depend on the processing pipeline, which includes various points of user input and data-dependent choices. We have therefore limited our analysis to the outputs of CTFFIND5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) CTFFIND5 in cryo-ET

      (1.1) CTFFIND4 is prone to unreliable CTF estimates at high tilts in cryoET, a situation that can be identified by high variability or 'unstable' estimates as a function of the tilt angle. Prof. Mastronarde recently illustrated this situation in his article JSB 216:108057, 2024 (Fig. 7). Therefore, the authors could add results to show whether the improvements to tilt estimation introduced in CTFFIND5 overcome this problem. So, in addition to the estimation of tilt angle and tilt axis in Figure 2, the estimated defocus could also be shown.

      We have worked with Prof. Mastronarde to help him use CTFFIND as a tool in his cryoET processing pipeline. Mastronarde chose CTFFIND because it contains algorithms and architecture that he could optimize for his purposes. CTFFIND5 is currently lacking the concept of a tilt series and can therefore not take advantage of the additional information that comes with tilt series. Our own applications for CTFFIND5 currently do not include tomography, and our results presented in Fig. 2 were obtained for validation of the tilt estimation feature. We did not attempt to duplicate Mastronarde’s optimization for reliable tilt series processing.

      Figure 2b of this manuscript already suggests that CTFFIND5 may exhibit some variability of defocus estimates at high tilts (in view of the variability of tilt axis angle). A strategy used in IMOD and TOMOCTF is to consider the tiles of a group of consecutive images (typically 35; especially at high tilts) to add more signal to the average spectrum, thus providing more reliable estimates (illustrated in Mastronarde's article JSB 216:108057, 2024, Fig. 8). Will the authors think that CTFFIND5 might include a strategy like this for cryoET tilt-series?

      We currently do not have plans to develop CTFFIND5 as a tool for tomography as there are already other excellent tools available, some of them based on CTFFIND’s basic algorithms (see previous comment).

      (1.2) In cryoET, the CTF is often determined on the aligned tilt-series, with the tilt axis typically running along the Y axis. Has CTFFIND5 got the option to exclude estimation of the tilt geometry (tilt angle and/or axis) and, instead, take tilt geometry directly from the alignment and/or from the microscope??. This would significantly speed up determination of the CTF (in 1-2 seconds per image, according to Table 2) while still taking advantage of all power spectra in tilted images (as described in their tilt estimation algorithm) for improved CTF estimation. This strategy would be similar to what it is done in Bsoft and IMOD.

      This is an excellent idea and we may implement this in an updated version. The current version is primarily meant for lamellae and single-particle samples where we usually have a single tilt in an unknown direction. For these cases, the suggested feature will have less benefit. 

      Thus, I suggest that the authors should also include results comparing CTF estimation in aligned tilt-series with CTFFIND4 and with CTFFIND5 (with no tilt estimation but indeed taking the tilt information from the alignment or the microscope into account). The results would show that CTFFIND5 is more robust than CTFFIND4, especially at high tilts.

      Thank you for this suggestion. We are now showing a comparison of defocus estimates from CTFFIND4 and CTFFIND5 in Fig. 2. Indeed, in one case CTFFIND5 seems to report more robust defocus values at high tilt.

      (1.3) The newer improvements in CTFFIND5 seem to be especially tailored to cryoET. The cryoET community will be highly attracted by these improvements. However, the current standard acquisition protocols (exposure of 3-5 e/A2 per image, tilts up to 60 degrees, etc) limit their full exploitation, particularly the thickness-aware CTF determination. I believe that adding a paragraph exclusively focused on cryoET and describing the potential benefits from CTFFIND5 and their limitations could enrich the Conclusion section. In this paragraph, the authors could highlight the great benefits from the tilt-aware CTF estimation. They could also discuss the current standard acquisition protocols (e.g. exposure 3-5 e/A2 per image, nominal defocus 3-5 microns, cellular thickness from 150 nm up to 200-300 nm that, at a tilt of 60 degrees, become 300 nm up to 400-600 nm) and their implications for the potential benefit from the improvements available in CTFFIND5.

      This reviewer is clearly excited about the potential application of CTFFIND5 in cryoET. We are sorry that we are currently not developing CTFFIND5 in this direction.

      (1.4) Apologies for insisting on cryoET in the previous points. I am just trying to suggest ideas to make CTFFIND5 even more helpful in cryoET. You can consider them now, or for a future version of the software, or just ignore them.

      Thanks for your suggestions. Since there is clearly demand for tools to process tomographic tilt series, we will keep these suggestions in mind for the future development of CTFFIND.

      (2) Tilt estimation

      (2.1) Page 4. Tiles for the initial steps in tilt estimation are of size 128x128.  At which point tiles of larger size (e.g. 512x512) are used?. Please, define.

      Thank you for pointing out this lack of clarity. For the tilt estimation, we used a tile size 128 x 128, which has been hard-coded in our program, as mentioned in line 68 on page4. For generating the final power spectrum, we usually use size 512 x 512. This tile size can be defined by the user when running the program. We have now clarified this on Page 4, L74-76.

      (2.2) Page 6 and/or page 11: evaluation of tilt estimation with tilt-series.

      Please indicate the acquisition details of the tilt-series used for the evaluation, especially the exposure per image. This information is neither available in this manuscript nor in Elferich et al., 2022.

      Please, add these acquisition details similarly to page 9 in this manuscript (evaluation of sample thickness estimation using tomography): pixel size, exposure per image and total exposure, number of images, tilt range and interval

      The same tilt-series were used to verify tilt-estimation and sample thickness. We have revised the Methods section to make this clear on Page5, L98-105 and Page 10, L202.

      (2.3) Page 10. Section Results. Subsection Tilt estimation.

      The authors use "defocus correction" to refer to their method for scaling the power spectra. "Defocus correction" might perhaps be a misleading term. In contrast, in page 4 the authors use the term "tilt correction". Please, revise and make it consistent throughout the manuscript.

      We agree and now use “tilt correction” throughout the manuscript.

      (2.4) Legend of Figure 2.

      Please add what the red dashed curve represents. Also, please note there might be an error in the estimated stage tilt axis angle: the legend states "171.8" where in the main text it is "178.2" (apparently, the latter is the correct one).

      Thank you for pointing this out. We have modified the legend and changed the number in the legend to 178.2°.

      (3) Thickness estimation

      (3.1) Line 141, page 7. The sentence reads: "The modulation of the CTF due to sample thickness t is described by the function E (current Equation 6), "  I believe that the modulation envelope of the CTF due to sample thickness is not really E (current Equation 6), but the function sinc(E). Please, revise.

      We have revised the manuscript as advised, Page 7, L148.

      (3.2) Line 148, page 7. The sentence reads "an estimate of the frequency g of the first node of the CTF_t function "

      The concept of 'node' was introduced by Tichelaar et al. (2020). The authors should not assume that this concept is familiar to the readership. So, it is suggested that the authors should introduce this concept in this section. For instance, just after Equation 6 they could add a sentence like this: "This sinc modulation envelope increasingly attenuates the amplitude of the Thon rings with increasing spatial frequencies in an oscillatory fashion, with locations where the amplitude is zero known as nodes (Tichelaar et al., 2020)."

      Thank you for this suggestion. We have revised the manuscript accordingly (Page 7, L151-156) and also marked the position of the first node in Fig. 3a.

      (3.3) Line 154, page 8: A citation is lacking: "(corrected for astigmatism, as described in )". Perhaps the authors refer to the EPA (EquiPhase Averaging) method introduced by Zhang, JSB 193:1-12, 2016, 10.1016/j.jsb.2015.11.003.

      Thanks for spotting this omission. We have added the appropriate reference.

      (3.4) Figure 3.

      (3.4.1) Perhaps, the EPA (EquiPhase Averaging) method is used to reduce the 2D CTF to 1D curves, as represented in Figure 3b and 3c. Please, mention this in the legend of the figure or in the main text referring to Figure 3. The same might apply to Figure 1c.

      Thanks for spotting this omission. We have clarified that this is indeed an EPA in the figure legends.

      (3.4.2) Please indicate what the colored curves represent in 3b and 3c: The fitted CTF model (dashed red) and the EPA or astimatism-corrected radial average of power spectrum (solid black) ?

      Thanks for spotting this omission. We have added descriptions of the colored lines in these plots (red = modeled CTF, blue = goodness of fit).

      (3.4.3) Please note that the power spectrum (solid black curves in Figure 3b and 3c) does not look the same in the top and bottom panels: Without thickness estimation (top panels), the power spectrum is in the range [0,1] in Y, as expected. However, with thickness estimation (bottom panels), the power spectrum seems to have undergone a frequencydependent transformation (a rescaling or something that makes the power spectrum oscillates around 0.5 in Y). This transformation of the power spectrum resembles the thickness-induced sinc modulation of the CTF and seems to be appropriate to better fit the new thickness-aware CTF_t model in CTFFIND5 to the (transformed) power spectrum. However, this transformation of the power spectrum is not mentioned in the manuscript at all. Instead, according to the main text (page 8), the fitting method is based on the crosscorrelation between the new CTF model and the power spectrum, so I was expecting to see the same power spectrum black curve in the top and bottom panels. Please, clarify.

      Indeed, CTFFIND5 displays the power spectrum differently after thickness estimation. We have revised the methods to explain this (page8, L178-181). The reviewer is also correct that the 1D lines plots of the Thon ring patterns in Fig. 3b and 3c are not identical. These 1D plots are generated from the 2D plots according to the fitted CTF, which is needed to follow the astigmatic rings and avoid blurring of the oscillations in the radial average. This means that different CTF fits will also result in somewhat different 1D plots. However, these differences only affect the 1D EPA plots shown to the user. The actual fitting is performed against the same 2D spectra.

      (3.4.4) Line 319, Page 14. "A linear fit revealed .." It would be good to add a line with the linear fit in Figure 5.

      Agreed. The revised Fig. 5 now shows a line for the linear fit.

      (3.5) New CTF Model

      It is not clear from the text if the new CTF_t model is used at all times in CTFFIND5 or only when the user requests thickness estimation. Related to this, if the user requests both tilt estimation and thickness estimation, how is the CTF estimation process carried out in CTFFIND5?: Tilt and thickness are estimated at the same time? or one after the other (i.e. first the tilt is estimated, then followed by thickness estimation)?. Please, clarify.

      The new CTF_t model is only used when the user requests thickness estimation. When both tilt-estimation and thickness estimation are requested, the tilt is estimated first and the corrected power spectrum is then fitted using the CTF_t model. We have revised the Methods section to explain this better, Page 8, L158-159.

      (4) Pages 14-15. Section "CTF estimation and correction assists "

      This section just shows that correction of a highly underfocused image for the CTF with phase flipping or a Wiener filter reduces the CTF-induced fringes. I do not really understand the inclusion of this section to the manuscript. There is no contribution related to CTFFIND5.  

      The ability to apply a CTF correction to the input image according to Tegunov & Cramer is a new feature of apply_ctf, a program included with cisTEM. We think that this section fits into the theme of CTFFIND5 because the correction adds valuable information about the samples, such as FIB-milled lamellae.

      If the authors prefer to keep this section, then please take the following points into account:

      (4.1) Figure 6b: This is the only time that the term "EPA" (EquiPhase Averaging, I guess) is used in the manuscript. Please, spell it out somewhere in the manuscript, define what it means and add a proper citation, if convenient. This point is related to point 3.3 above.

      We have added the appropriate reference and defined EPA in the methods section as indicated in the reply to point 3.3.

      (4.2) Figure 6d. The contrast of this image is poor. Please, increase the contrast (to be similar to Figure 6c) so that the details can be better discerned. The image also shows a grainy texture, likely artefacts from the Wiener filter due to excessive amplification. Maybe the 'strength parameter' S of the deconvolution Wiener filter (Tegunov & Cramer, 2019) should be tuned down or the 'fall-off parameter' F tuned up to try to attenuate these artefacts.

      Agreed. The revised figure shows panel d with increased contrast with the custom fall-off parameter set to 1.3 and the custom strength parameter set to 0.7.

      (5) CTFFIND5 runtimes

      Table 2 shows that estimation of tilt increases the runtime up to 39 s in an image of 4070x2892 and to 208 s in one of 2880x2046. There is a significant difference between these two cases (39 s vs. 208 s) and the first image is much larger than the second. Why does CTFFIND5 on the smaller image take so long compared to the larger image?

      During tilt estimation, the images are binned to a pixel size of 5 Å. This causes micrograph 1 to be substantially smaller (in pixels) than micrographs 2 and 3, resulting in the faster runtime.

      (6) Conclusions

      (6.1) In the Conclusion section, the authors could elaborate a bit the insights about the sample quality provided by CTFFIND5. This is stated in the title of the manuscript, but it was hardly mentioned in the manuscript.

      We have revised the conclusion to make this clearer (Page 16, L389-396). CTFFIND5 helps in estimating sample quality since (1) the sample thickness is an important determinant in the amount of high-resolution signal in a micrograph and (2) the estimated fit-resolution reflects more accurately the amount of signal present in a micrograph after tilt and sample thickness have been taken into account.

      (6.2) The authors nicely identify and describe the applications where thickness-aware CTF determination will be valuable: in situ single particle analysis and in vitro single particle cryoEM of purified samples at low voltages. Perhaps, CTFFIND5 will also be of great interest for single particle cryoEM of thick specimens (e.g. capsid of large viruses with diameter in the range 120-200 nm such as PBCV-1 or HSV-1).

      Agreed. We have added this case to our Conclusions. (Fig. 3d)

      (7) Typographical errors:

      line 161, page 8. "1.5 time" should be "1.5 times"

      lines 185-191. All exposures are given in 'electrons/Angstrom', not in 'electrons/square Angstrom'

      line 206, page 10. With "slides" the authors seem to mean "slices"

      line 338, page 14: "describeD by Tegunov"

      line 349, page 15. "power spectra"

      lines 366 and 368, page 15: Note that Square Angstrom is written as "A2". Put "2" with superscript.

      Thank you for pointing out these errors. They have been corrected.

      (8) References:

      Reference: Lucas et al., eLife 10 e68946. Year is lacking. Add year: 2021.

      Reference: Yan et al. 2015 cited in line 169, page 8, does not appear in Bibliography. The authors may mean: Yan et al. 2015 JSB 192:287-296, 2015  

      It would be good to cite Bsoft, as it has a procedure similar to tilt-corrected CTF estimation: Heymann, Protein Science, 2021,  

      Thank you for carefully checking the cited references. We have revised the manuscript as suggested.

      Reviewer #2 (Recommendations For The Authors):

      I have only minor suggestions for improvement below:

      L218: "these option"

      Corrected

      L243: "chevron-shape" -> V-shape would be more accessible language for non-native speakers.

      Changed

      L281: "Based on these results we conclude that CTFFIND5 will provide more accurate CTF parameters" -> Given that the maximum resolutions of the fits by the old model and the new model are nearly the same, how big would the actual advantage of the new model be for subsequent sub-tomogram averaging?

      Please see our response above, Reviewer #3 (Public Review), 

      L376: The correct reference for RELION per-particle CTF estimation is Zivanov et al, (2018) [https://elifesciences.org/articles/42166]. Also, the cryoSPARC paper referenced does not describe per-particle CTF estimation and should thus be removed from this context.

      Thanks for pointing out these mistakes, which we have now corrected. We have chosen to keep the citation for CryoSPARC to reference the general software, but have added Ziavanov et.al. 2020 as suggested by the CryoSPARC website.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      Figure 1A legend - authors mention boxes but only 1 box is shown.

      Thank you for pointing this out. For visual clarity we decided to only show one box. We have corrected the legend.

      Figure 1B - it would be nice if the boxes that contributed to the power spectra were mapped on Figure 1A

      The shown power spectra are not actual data. Instead, we show power spectra with exaggerated defocus differences for visual clarity. We have revised the figure legends to make this clear. 

      The Y-axis legends in Figure 2 are not aligned vertically

      Corrected

      Figure 3A - CTFFIND4 is missing an "I"

      Corrected

      Figure 3 - Y-axis legends are not aligned vertically

      Corrected

      Page 16, line 376, Relion should be RELION

      We have revised the manuscript as advised.

      Typo in equation 5, sinc versus sin?

      “sinc” is correct here, since this is a thickness-dependent modulation of the CTF.

      Lambert-Beer's, Lambert-Beer are used variably but curious if Beer-Lambert should be used.

      We have revised the manuscript as advised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study by Zhou, Wang, and colleagues, the authors utilize biventricular electromechanical simulations to illustrate how different degrees of ionic remodeling can contribute to different ECG morphologies that are observed in either acute or chronic post-myocardial infarction (MI) patients. Interestingly, the simulations show that abnormal ECG phenotypes - associated with a higher risk of sudden cardiac death - are predicted to have almost no correspondence with left ventricular ejection fraction, which is conventionally used as a risk factor for arrhythmia.

      Strengths:

      The numerical simulations are state-of-the-art, integrating detailed electrophysiology and mechanical contraction predictions, which are often modeled separately. The simulation provides mechanistic interpretation, down to the level of single-cell ionic current remodeling, for different types of ECG morphologies observed in post-MI patients. Collectively, these results demonstrate compelling and significant evidence for the need to incorporate additional risk factors for assessing post-MI patients.

      Weaknesses:

      The study is rigorous and well-performed. However, some aspects of the methodology could be clearer, and the authors could also address some aspects of the robustness of the results. Specifically, does variability in ionic currents inherent in different patients, or the location/size of the infarct and surrounding remodeled tissue impact the presentation of these ECG morphologies?

      We thank the reviewer for their considered evaluation. In response to the reviewer’s comments regarding variability in ionic currents, we have added simulations using a n=17 populations of models with variability in ionic conductances in the baseline ToR-ORd model to the paper, to show the effect of such variation on the post-MI ECG presentation in acute and chronic conditions. This is now described in the Methods [lines 140, 158-161, 242-244, 245-246, 261-263], and shown in the methods Figure 1A, 1B. The ECG results using this population of models are shown in Figure 2C and described in [lines 333-335] and the pressure volume results using the population of models are shown in Figure 5A and 5B and described in [lines 417-418, 442-444, 448-450]. The population of models showed consistent patterns in both the ECG and LVEF as the baseline model, this is discussed in [lines 563-564, 688-690].

      Regarding the effect of scar location and size on the ECG, we refer the reader and reviewer to a related paper where this is explored in depth using a formal sensitivity analysis and deep learning inference (https://pubmed.ncbi.nlm.nih.gov/38373128/). This is better able to do justice to this question rather than overloading this paper with additional investigations. We include a reference to this paper in the discussion section [lines 694-695].

      Reviewer #2 (Public Review):

      Summary:

      The authors constructed multi-scale modeling and simulation methods to investigate the electrical and mechanical properties of acute and chronic myocardial infarction (MI). They simulated three acute MI conditions and two chronic MI conditions. They showed that these conditions gave rise to distinct ECG characteristics that have been seen in clinical settings. They showed that the post-MI remodeling reduced ejection fraction up to 10% due to weaker calcium current or SR calcium uptake, but the reduction of ejection fraction is not sensitive to remodeling of the repolarization heterogeneities.

      Strengths:

      The major strength of this study is the construction of computer modeling that simulates both electrical behavior and mechanical behavior for post-MI remodeling. The links of different heterogeneities due to MI remodeling to different ECG characteristics provide some useful information for understanding complex clinical problems.

      Weaknesses:

      The rationale (e.g., physiological or medical bases) for choosing the 3 acute MI and 2 chronic MI settings is not clear. Although the authors presented a huge number of simulation data, in particular in the supplemental materials, it is not clearly stated what novel findings or mechanistic insights this study gained beyond the current understanding of the problem.

      We thank the reviewer for their careful evaluations of our work. The justification for selecting the 3 acute MI and 2 chronic MI states is based on clinical and experimental reports, as summarised in the Methods section [lines 245-247, 252-256, 264-266].  We have also highlighted the key novelty and significance of the study in the Discussion [lines 579-582].

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) This was clarified very late in the Discussion, but for most of the paper, I was unclear if heart geometry was the same for all simulations. Presumably, this includes the size and location of the infarct, BZ, and RZ. It would be helpful to clarify this in the Methods.

      This has been clarified in the first paragraph of the Methods section [lines 142-145].

      (2) On lines 224-226, the Methods refers to implementing several population members from the ToR-ORd model (in addition to the baseline) into the biventricular EM simulations. Is this in reference to the simulations shown in Figures 6 and 7, or different simulations? Please clarify.

      We now randomly select 17 of the 245 cell models in the population to be embedded in ventricular simulations, to produce a ventricular population of models. This allows us to explore the effect that physiological variability in the baseline ionic conductances has on the phenotypic representation of ionic remodellings in the ECG and LVEF. An explanation of this can be found in the Methods section [lines 241-244].

      For Figures 6 and 7, we selected two arrhythmic cell models from the n=245 population of cell models to be embedded into two ventricular simulations to demonstrate the arrhythmic potential of the cellular model at ventricular scale. This has been clarified in Methods [lines 269-271].

      Additionally, for the cases where a population member is used, are all regions of the ventricles "scaled" in the same manner, or were only the properties of the particular region drawn from the population modified relative to baseline (e.g., mid-myocardial cells in Figure 6)?

      The cells were embedded according to transmural heterogeneity in the remote zone for Figures 6 and 7. This has been clarified in the Methods [line 271-273].

      (3) Interestingly, the study finds that the ionic remodeling in different peri-infarct regions to be most critical in the ECG phenotype, which at least strongly suggests that inherent intra-patient variability in ion channel expression could also be critical.

      This is related to the comment on the use of population members. If the authors utilized one of the ventricular myocyte population members as the 'reference' (instead of the baseline ToR-ORd parameters) and applied the same types of remodeling as in Figures 3 and 4, would they expect the same ECG morphologies?

      We have now performed this test and selected 17 cell models from the population to create a ventricular population of models. On top of this ventricular population, we have applied the remodellings, and showed that the simulated ECG morphologies were mostly consistent across these 20 members (Figure 2C).

      (4) Related, do the authors expect that the location and/or size of the infarct and peri-infarct regions would impact the different ECG morphologies?

      Regarding the effect of scar location and size on the ECG, we refer the reader and reviewer to a related paper where this is explored in depth using a formal sensitivity analysis and deep learning inference (https://pubmed.ncbi.nlm.nih.gov/38373128/). We feel this is better able to do justice to this question rather than overloading this paper with additional investigations. We include a reference to this paper in the discussion section [lines 694-695].

      Reviewer #2 (Recommendations For The Authors):

      (1) Although the authors listed the parameters and cited the papers for the origins of the parameter changes in SM4 and table S4, it should be summarized in the methods section what are the major changes or differences for the 5 conditions. Furthermore, it should be stated what is the rationale for choosing these conditions. Are these choices based on clinical classifications or experimental conditions?

      The major differences between the 5 conditions have now been summarised in the Methods [lines 252-256, 264-266]. These remodellings have been collated from a range of experimental measurements in both human and animal data, which are summarised in Table S4. This has been clarified in Methods [lines 245-247].

      (2) Figure 3C and Figure 4C do not add any additional information beyond the conductance changes listed in Table 4, and I'd suggest removing them from the figures. On the other hand, it took me some time to look at Table 4 to figure out the corresponding changes. As commented above, the remodeling changes should be summarized in the main text to help reading.

      Figure 3C and 4C provide a visual explanation of the ionic remodellings in these conditions to echo the added descriptions in the text [lines 252-256, 264-266]. For this reason, we have elected to keep those figures in the manuscript.

      (3) The authors presented a large amount of data in Supplemental Materials, some may be unnecessary and some are difficult to follow. For example; 1) There is a lot of data in Table S6, there is a simple mention in the main text and Table S6 legend. A summary of the data is needed for the readers to understand the properties of the different conditions, instead of letting the readers figure them out from the table. The same should be done for other tables and figures. There are some format issues for the tables, which mess up some of the numbers and text. 2) The data shown in Figures S25-29 provide almost no new information beyond the well-known effects of ionic currents on EAD genesis, i.e., EADs are promoted by inward currents and suppressed by outward currents. The data for alternans (Figures S18-22) are a little more complex than the cases for EADs, I think that they can be simplified.

      Thanks for the suggestions. We have now extracted the key information from Table S6- S9 and summarized them in the caption. We have also fixed the layout of the tables in this revision. The supplementary sections on alternans and EADs are simplified with the key parameters related to these proarrhythmic phenomena summarized in tables instead of showing all boxplots of parameter distributions (Tables S10 and S11).

      (4) The authors showed two mechanisms of alternans: EAD-driven and Ca-driven alternans in chronic MI. There are several distinct mechanisms of alternans including EAD-induced alternans (see the recent review by Qu and Weiss, Circ Res 132, 127(2023)). Theoretically, calcium alternans can also induce EAD alternans under proper conditions, can you rule out that the EAD alternans are not due to Ca alternans? The results in Fig.7D may say the opposite. There are some chicken-or-egg issues here.

      In Figure 7D, we showed that the epicardial cell type (blue trace) had stable EADs at fast pacing with no calcium alternans, while both the endocardial (red trace) and mid-myocardial (green trace) cell types failed to fully repolarise in every other beat. To explore whether the EAD alternans are driven by calcium alternans, we tested the effects of switching off the alternans related remodelling, and the APs tuned out to be normal. On the other hand, when we turned off the EAD related remodelling, neither EADs nor alternans occurred. Therefore, the results show the two types of ionic current remodelling are both necessary for the generation of EAD alternans (lines 656-659 in the discussion and SM9).

      (5) As for the formation of ectopic beats, it can be caused by EADs but it can caused by repolarization gradient, they are not the same and differ in different AP models (Liu et al, CircAE 12, e007571 (2019), Zhang et al, Biophy J 120, 352(2021)). It is not clear here whether the primary cause is repolarization gradient or EADs. At tissue, EADs tend to be suppressed by repolarization gradient, there is a goldilocks between the EAD amplitude and repolarization gradient for an ectopic beat to form.

      When isolated cells that showed EAD were embedded in ventricular tissue, we saw ectopic wave propagation. This was because the EADs in the RZ generated conduction block, which enabled a large repolarisation gradient to form between the BZ and RZ, thereby leading to ectopy. This has been clarified in the Results [lines 507-510].

      Additionally, we have clarified the presence of the EADs in the ventricular simulations by labelling where this occurs in the green, purple, and yellow traces in Figure 7C. This was easily missed before due to the stretched proportions of the traces in the x-axis, which is necessary to show clearly the repolarisation gradients that drive ectopy.

      (6) The authors showed many population simulations. I guess that they are all in single cells. If the population simulations were done in the whole heart, it should be stated how many models were simulated. If only one of the population models was selected for the whole heart for each case, it should clarify the rationale for choosing one of the many models. If populations of cells were modeled in the whole heart, clarify how the models were distributed in the heart.

      We now randomly select 17 of the 245 cell models in the population to be embedded in ventricular simulations, to produce a ventricular population of models. This allows us to explore the effect that physiological variability in the baseline ionic conductances has on the phenotypic representation of ionic remodellings in the ECG and LVEF. An explanation of this can be found in the Methods section [lines 241-244]. Whenever the cell models are embedded in the relevant zones, they are uniformly distributed according to the transmural heterogeneity [lines 271-273].  

      (7) QRS intervals in the simulations are much wider than the real recordings from patients (Figure 2 and Table S8). At least, a QRS of 120 ms for normal control is too wide and probably not normal.

      We have manually measured QRS duration and updated the delineation method to calculate the other biomarkers. The new values now lie within normal ranges and have been updated in SM Table S7 and S8 and in Figure 2, and the new delineation method has been included in SM2.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Madigan et al. assembled an interesting study investigating the role of the MuSK-BMP signaling pathway in maintaining adult mouse muscle stem cell (MuSC) quiescence and muscle function before and after trauma. Using a full body and MuSC-specific genetic knockout system, they demonstrate that MuSK is expressed on MuSCs and that eliminating the BMP binding domain from the MuSK gene (i.e., MuSK-IgG KO) in mice at homeostasis leads to reduced PAX7+ cells, increased myonuclear number, and increase myofiber size, which may be due to a deficit in maintaining quiescence. Additionally, after BaCl2 injury, MuSK-IgG KO mice display accelerated repair after 7 days post-injury (dpi) in males only. Finally, RNA profiling using nCounter technology showed that MuSK-IgG KO MuSCs express genes that may be associated with the activated state.

      Strengths:

      Overall, the biology regulating MuSC quiescence is still relatively unexplored, and thus, this work provides a new mechanism controlling this process. The experiments discussed in the paper are technically sound with great complementary mouse models (full body versus tissue-specific mouse KO) used to validate their hypothesis. Additionally, the paper is well written with all the necessary information in the legends, methods, and figures being reported.

      Weaknesses:

      While the data largely supports the author's conclusions, I do have a few points to consider when reading this paper.

      (1) For Figure 1, while I appreciate the author's confirming MuSK RNA and protein in MuSCs, I do think they should (a) quantify the RNA using qPCR and (b) determine the percentage of MuSCs expressing MuSK protein in their single fiber system in multiple biological replicates. This information will help us understand if MuSK is expressed in 1/10 or 10/10 PAX7-expressing MuSCs. Also, it will help place their phenotypes into the right context, especially when considering how much of the PAX7-pool is expressing MuSK from the beginning.

      The quantification is a reasonable point; however, we don’t believe that this information is necessary for supporting the interpretation of the findings.

      We agree that determining the proportion of SCs that expressing MuSK is useful information and we will address this question in the Revision.

      (2) Throughout the paper the argument is made that MuSK-IgG KO (full body and MuSC-specific KOs) are more activated and/or break quiescence more readily, but there is no attempt to test directly. Therefore, the authors should consider measuring the activation dynamics (i.e., break from quiescence) of MuSCs directly (EdU assays or live-cell imaging) in culture and/or in muscle in vivo (EdU assays) using their various genetic mouse models

      We agree that this point is of interest and we plan to address it in future studies.

      (3) For Figure 2, given that mice are considered adults by 3 months, it is really surprising how just two months later they are starting to see a phenotype (i.e., reduced PAX7-cells, increased number of myonuclei, and increased myofiber size)-which correlates with getting older. Given that aged MuSCs have activation defects (i.e., stuck somewhere in the quiescence cycle), a pending question is whether their phenotype gets stronger in aged mice, like 18-24 months. If yes, the argument that this pathway should be used in a therapeutic sense would be strengthened.

      We agree that the potential role of the MuSK-BMP pathway in aged SCs is of import and could shed new light on SC dynamics in this context. However, we note that the activation observed between 3-5 months results in improved muscle quality (increased myofiber size and grip strength), which is opposite of what is observed with aging. We agree that activating the MuSK-BMP pathway in aged animals has the potential to activate SCs, promote muscle growth and counter sarcopenia.  Pharmacological and genetic approaches to test that question are underway, but given the time frame they are beyond the scope of the current manuscript.

      (4) For Figure 4, the same question as in point (2), the increase in fiber sizes by 7dpi in MuSK-IgG KO males is minimal (going from ~23 to 27 by eye) and no difference at a later time point when compared to WT mice. However, if older mice are used (18-24 months old) - which are known to have repair deficits-will the regenerative phenotype in MuSK-IgG KO mice be more substantial and longer lasting?

      Again, an interesting point that will be addressed in future studies. 

      (5) For Figure 6, this gene set is not glaringly obvious as being markers of MuSC activation (i.e., no MyoD), so it's hard for the readers to know if this gene set is truly an activation signature. Also, the Shcherbina et al. data presented as a column with * being up or down (i.e. differentially expressed) is not helpful, since you don't know whether those mRNAs in that dataset are going up with the activation process. Addressing this point as well as my point (1) will further strengthen the author's conclusions about the MuSK-IgG KO MuSCs not being able to maintain quiescence as effectively.

      We agree that this Figure should include more information and be formatted in a way more readily convey the point. We will provide these changes in the Revision.

      Reviewer #2 (Public review):

      Summary:

      The work by Madigan et al. provides evidence that the signaling of BMPs via the Ig3 domain of MuSK plays a role during muscle postnatal development and regeneration, ultimately resulting in enhanced contractile force generation in the absence of the MuSK Ig3 domain. They demonstrate that MuSK is expressed in satellite cells initially post-isolation of muscle single fibers both in WT and whole-body deletion of the BMP binding domain of MuSK (ΔIg3-MuSK). In developing mice, ΔIg3-MuSK results in increased muscle fiber size, a reduction in Pax7+ cells, and increased muscle contractile force in 5-month-old, but not 3-month-old, mice. These data are complemented by a model in which the kinetics of regeneration appear to be accelerated at early time points. Of note, the authors demonstrate muscle tibialis anterior (TA) weights and fiber feret are increased during development in a Pax7CreERT2;MuSK-Ig3loxp/loxp model in which satellite cells specifically lack the MuSK BMP binding domain. Finally, using Nanostring transcriptional the authors identified a short list of genes that differ between the WT and ΔIg3-MuSK SCs. These data provide the field with new evidence of signaling pathways that regulate satellite cell activation/quiescence in the context of skeletal muscle development and regeneration.

      On the whole, the findings in this paper are well supported, however additional validation of key satellite cell markers and data analysis need to be conducted given the current claims.

      (1) The Pax7CreERT2;MuSK-Ig3loxp/loxp model is the appropriate model to conduct studies to assess satellite cell involvement in MuSK/BMP regulation. Validation of changes to muscle force production is currently absent using this model, as is quantification of Pax7+ tdT+ cells in 5-month muscle. Given that MuSK is also expressed on mature myofibers at NMJs, these data would further inform the conclusions proposed in the paper.

      As reported in the manuscript, we observed increased myofiber size, length and TA weight in the conditional mutants at five months of age. We did not assess grip strength in those experiments. 

      We demonstrated highly efficient MuSK Ig3-domain recombination by PCR analysis of FACS-sorted SCs from these conditional mutants (Supplemental Fig. S3). However, while we checked for Pax7+ tdT+ cells in 5-month SCs, we did not quantify this finding.

      (2) All Pax7 quantification in the paper would benefit from high magnification images including staining for laminin demonstrating the cells are under the basal lamina.

      The point is reasonable, we observed that these Pax7+ cells were under the basal lamina, but we did not acquire images at higher magnification.   

      (3) The nanostring dataset could be further analyzed and clarified. In Figure 6b, it is not initially apparent what genes are upregulated or downregulated in young and aged SCs and how this compares with your data. Pathway analysis geared toward genes involved in the TGFb superfamily would be informative.

      We agree that further analysis and information regarding the data in this Figure is warranted and we will include it in the Revision.

      (4) Characterizing MuSK expression on perfusion-fixed EDL fibers would be more conclusive to determine if MuSK is expressed in quiescent SCs. Additional characterization using MyoD, MyoG, and Fos staining of SCs on EDL fibers would help inform on their state of activation/quiescent.

      These are all valid points that we intend to address in future experiments.

      (5) Finally, the treatment of fibers in the presence or absence of recombinant BMP proteins would inform the claims of the paper.

      As reported in Jaime et al (2024) we have extensively characterized the differences in BMP response in both cultured WT and DIg3-MuSK myofibers and myoblasts at the level of signaling (pSMAD 1/5/8 nuclear localization and phosphorylation) and gene expression (qRT-PCR).

      Reviewer #3 (Public review):

      Summary:

      Understanding the molecular regulation of muscle stem cell quiescence. The authors evaluated the role of the MuSK-BMP pathway in regulating adult SC quiescence by the deletion of the BMP-binding MuSK Ig3 domain ('ΔIg3-MuSK').

      Strengths:

      A novel mouse model to interrogate muscle stem cell molecular regulators. The authors have developed a nice mouse model to interrogate the role of MuSK signaling in muscle stem cells and myofibers and have unique tools to do this.

      Weaknesses:

      Only minor technical questions remain and there is a need for additional data to support the conclusions.

      (1) The authors claim that dIg3-MuSK satellite cells break quiescence and start fusing, based on the reduction of Pax7+ and increase of nuclei/fiber (Fig 2-3), and maybe the gene expression (Fig6). However, direct evidence is needed to support these findings such as quantifying quiescent (Pax7+Ki67-) or activated (Pax7+Ki67+) satellite cells (and maybe proliferating progenitors Pax7-Ki67+) in the dIg3-MuSK muscle.

      We believe that the data presented strongly supports the conclusion that the SCs break quiescence, activate, and fuse into myofibers in uninjured muscle.  As noted above, the mechanistic studies suggested are of interest and we will address them in future work.

      (2) It is not clear if the MuSK-BMP pathway is required to maintain satellite cell quiescence, by the end of the regeneration (29dpi), how Pax7+ numbers are comparable to the WT (Fig4d). I would expect to have less Pax7+, as in uninjured muscle. Can the authors evaluate this in more detail?

      The reviewer makes an important point. Our current interpretation of the findings is that quiescence is broken in SCs in uninjured muscle, but that ‘stemness’ is preserved, allowing for efficient muscle regeneration and restoration of the SC pool. Whether such properties reflect SC heterogeneity (as suggested in the comments of the other reviewers) and/or different states along a continuum is of particular interest and will be the focus of future studies. 

      (2) Figure 4 claims that regeneration is accelerated, but to claim this at a minimum they need to look at MYH3+ fibers, in addition to fiber size.

      We did not examine MYH3+ fibers in this study. However, we did observe increased in Pax7+ cells at 5dpi (male and female) as well as larger myofiber size (Feret diameter) at 7dpi in the male animals.  In addition, the panels in Figure 4 b,c (H&E and laminin, respectively) showing accelerated differentiation were selected to be representative of the experimental group. 

      (3) The Pax7 specific dIg3-MuSK (Fig5) is very exciting. However, it will be important to quantify the Pax7+ number. Could the authors check the reduction of Pax7+ in this model since it would confirm the importance of MuSK in quiescence?

      In Figure 5c, we assessed the number of Pax7+ cells in the conditional mutant during the course of regeneration (at 3, 5, 7, 14, 22 and 29 dpi). As discussed above, these results confirmed the findings of the constitutive mutant (reduction of Pax7+ cells in uninjured 5-month-old muscle) as well as showing the increased number at 5dpi and return to WT levels at 29 dpi.

      (3) Rescue of the BMP pathway in the model would be further supportive of the authors' findings.

      This point is valid. In a parallel study examining the role of the MuSK-BMP pathway at the NMJ, we have observed that BMP+/- (hypomorphs) recapitulate key phenotypes observed in DIg3-MuSK  NMJs (Fish et al., bioRxiv, 2023). This point will be included in the Revision. 

      (4) Is the stem cell pool maintained long term in the deleted dIg3-MuSK SCs? Or would they be lost with extended treatment since they are reduced at the 5-month experiments? This is an important point and should be considered/discussed relevant to thinking about these data therapeutically.

      We agree that this is an important point for future studies. 

      (5) Without the Pax7-specific targeting, when you target dIg3-MuSK in the entire muscle, what happens to the neuromuscular nuclei?

      A manuscript describing the phenotype of the NMJ in DIg3-MuSK constitutive mice is in bioRxiv (Fish et al., 2024) and is in Revision at another journal.  We anticipate discussing the findings in the Revised version of the current manuscript. 

      (6) Why were differences seen in males and not females? Is XIST downregulation occurring in both sexes? Could the authors explain these findings in more detail?

      The male and female difference in myofiber size is of interest.  The nanostring experiments,  which showed the XIST reduction, were only performed in male mice.

    1. Author response:

      eLife Assessment

      This valuable study reveals extensive binding of eukaryotic translation initiation factor 3 (eIF3) to the 3' untranslated regions (UTRs) of efficiently translated mRNAs in human pluripotent stem cell-derived neuronal progenitor cells. The authors provide solid evidence to support their conclusions, although this study may be enhanced by addressing potential biases of techniques employed to study eIF3:mRNA binding and providing additional mechanistic detail. This work will be of significant interest to researchers exploring post-transcriptional regulation of gene expression, including cellular, molecular, and developmental biologists, as well as biochemists.

      We thank the reviewers for their positive views of the results we present, along with the constructive feedback regarding the strengths and weaknesses of our manuscript, with which we generally agree. We acknowledge our results will require a deeper exploration of the molecular mechanisms behind eIF3 interactions with 3'-UTR termini and experiments to identify the molecular partners involved. Additionally, given that NPC differentiation toward mature neurons is a process that takes around 3 weeks, we recognize the importance of examining eIF3-mRNA interactions in NPCs that have undergone differentiation over longer periods than the 2-hr time point selected in this study. Finally, considering the molecular complexity of the 13-subunit human eIF3, we agree that a direct comparison between Quick-irCLIP and PAR-CLIP will be highly beneficial and will determine whether different UV crosslinking wavelengths report on different eIF3 molecular interactions. Additional comments are given below to the identified weaknesses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors perform irCLIP of neuronal progenitor cells to profile eIF3-RNA interactions upon short-term neuronal differentiation. The data shows that eIF3 mostly interacts with 3'-UTRs - specifically, the poly-A signal. There appears to be a general correlation between eIF3 binding to 3'-UTRs and ribosome occupancy, which might suggest that eIF3 binding promotes protein synthesis, possibly through inducing mRNA closed-loop formation.

      Strengths:

      The study provides a wealth of new data on eIF3-mRNA interactions and points to the potential new concept that eIF3-mRNA interactions are polyadenylation-dependent and correlate with ribosome occupancy.

      Weaknesses:

      (1) A main limitation is the correlative nature of the study. Whereas the evidence that eIF3 interacts with 3-UTRs is solid, the biological role of the interactions remains entirely unknown. Similarly, the claim that eIF3 interactions with 3'-UTR termini require polyadenylation but are independent of poly(A) binding proteins lacks support as it solely relies on the absence of observable eIF3 binding to poly-A (-) histone mRNAs and a seeming failure to detect PABP binding to eIF3 by co-immunoprecipitation and Western blotting. In contrast, LC-MS data in Supplementary File 1 show ready co-purification of eIF3 with PABP.

      We agree the molecular mechanisms underlying the crosslinking between eIF3 and the end of mRNA 3’-UTRs remains to be determined. We also agree that the lack of interaction seen between eIF3 and PABP in Westerns, even from HEK293T cells, is a puzzle. The low sequence coverage in the LC-MS data gave us pause about making a strong statement that these represent direct eIF3 interactions, given the similar background levels of some ribosomal proteins.

      (2) Another question concerns the relevance of the cellular model studied. irCLIP is performed on neuronal progenitor cells subjected to neuronal induction for 2 hours. This short-term induction leads to a very modest - perhaps 10% - and very transient 1-hour-long increase in translation, although this is not carefully quantified. The cellular phenotype also does not appear to change and calling the cells treated with differentiation media for 2 hours "differentiated NPCs" seems a bit misleading. Perhaps unsurprisingly, the minor "burst" of translation coincides with minor effects on eIF3-mRNA interactions most of which seem to be driven by mRNA levels. Based on the ~15-fold increase in ID2 mRNA coinciding with a ~5-fold increase in ribosome occupancy (RPF), ID2 TE actually goes down upon neuronal induction.

      We agree that it will be interesting to look at eIF3-mRNA interactions at longer time points after induction of NPC differentiation. However, the pattern of eIF3 crosslinking to the end of 3’-UTRs occurs in both time points reported here, which is likely to be the more general finding in what we present.

      (3) The overlap in eIF3-mRNA interactions identified here and in the authors' previous reports is minimal. Some of the discrepancies may be related to the not well-justified approach for filtering data prior to assessing overlap. Still, the fundamentally different binding patterns - eIF3 mostly interacting with 5'-UTRs in the authors' previous report and other studies versus the strong preference for 3'-UTRs shown here - are striking. In the Discussion, it is speculated that the different methods used - PAR-CLIP versus irCLIP - lead to these fundamental differences. Unfortunately, this is not supported by any data, even though it would be very important for the translation field to learn whether different CLIP methodologies assess very different aspects of eIF3-mRNA interactions.

      We agree the more interesting aspect of what we observe is the difference in location of eIF3 crosslinking, i.e. the end of 3’-UTRs rather than 5’-UTRs or the pan-mRNA pattern we observed in T cells. The reviewer is right that it will be important in the future to compare PAR-CLIP and Quick-irCLIP side-by-side to begin to unravel the differences we observe with the two approaches.

      Reviewer #2 (Public review):

      Summary:

      The paper documents the role of eIF3 in translational control during neural progenitor cell (NPC) differentiation. eIF3 predominantly binds to the 3' UTR termini of mRNAs during NPC differentiation, adjacent to the poly(A) tails, and is associated with efficiently translated mRNAs, indicating a role for eIF3 in promoting translation.

      Strengths:

      The manuscript is strong in addressing molecular mechanisms by using a combination of next-generation sequencing and crosslinking techniques, thus providing a comprehensive dataset that supports the authors' claims. The manuscript is methodologically sound, with clear experimental designs.

      Weaknesses:

      (1) The study could benefit from further exploration into the molecular mechanisms by which eIF3 interacts with 3' UTR termini. While the correlation between eIF3 binding and high translation levels is established, the functionality of these interactions needs validation. The authors should consider including experiments that test whether eIF3 binding sites are necessary for increased translation efficiency using reporter constructs.

      We agree with the reviewer that the molecular mechanism by which eIF3 interacts with the 3’-UTR termini remains unclear, along with its biological significance, i.e. how it contributes to translation levels. We think it could be useful to try reporters in, perhaps, HEK293T cells in the future to probe the mechanism in more detail.

      (2) The authors mention that the eIF3 3' UTR termini crosslinking pattern observed in their study was not reported in previous PAR-CLIP studies performed in HEK293T cells (Lee et al., 2015) and Jurkat cells (De Silva et al., 2021). They attribute this difference to the different UV wavelengths used in Quick-irCLIP (254 nm) and PAR-CLIP (365 nm with 4-thiouridine). While the explanation is plausible, it remains a caveat that different UV crosslinking methods may capture different eIF3 modules or binding sites, depending on the chemical propensities of the amino acid-nucleotide crosslinks at each wavelength. Without addressing this caveat in more detail, the authors cannot generalize their findings, and thus, the title of the paper, which suggests a broad role for eIF3, may be misleading. Previous studies have pointed to an enrichment of eIF3 binding at the 5' UTRs, and the divergence in results between studies needs to be more explicitly acknowledged.

      We agree with the reviewer that the two methods of crosslinking will require a more detailed head-to-head comparison in the future. However, we do think the title is justified by the fact that we see crosslinking to the termini of 3’-UTRs across thousands of transcripts in each condition. Furthermore, the 3’-UTR crosslinking is enriched on mRNAs with higher ribosome protected fragment counts (RPF) in differentiated cells, Figure 3F.

      (3) While the manuscript concludes that eIF3's interaction with 3' UTR termini is independent of poly(A)-binding proteins, transient or indirect interactions should be tested using assays such as PLA (Proximity Ligation Assay), which could provide more insights.

      This is a good idea, but would require a substantial effort better suited to a future publication. We think our observations are interesting enough to the field to stimulate future experimentation that we may or may not be most capable of doing in our lab.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript by Mestre-Fos and colleagues, authors have analyzed the involvement of eIF3 binding to mRNA during differentiation of neural progenitor cells (NPC). The authors bring a lot of interesting observations leading to a novel function for eIF3 at the 3'UTR.

      During the translational burst that occurs during NPC differentiation, analysis of eIF3-associated mRNA by Quick-irCLIP reveals the unexpected binding of this initiation factor at the 3'UTR of most mRNA. Further analysis of alternative polyadenylation by APAseq highlights the close proximity of the eIF3-crosslinking position and the poly(A) tail. Furthermore, this interaction is not detected in Poly(A)-less transcripts. Using Riboseq, the authors then attempted to correlate eIF3 binding with the translation efficacy of mRNA, which would suggest a common mechanism of translational control in these cells. These observations indicate that eIF3-binding at the 3'UTR of mRNA, near the poly(A) tail, may participate to the closed-loop model of mRNA translation, bridging 5' and 3', and allowing ribosomes recycling. However, authors failed to detect interactions of eIF3, with either PABP or Paip1 or 40S subunit proteins, which is quite unexpected.

      Strength:

      The well-written manuscript presents an attractive concept regarding the mechanism of eIF3 function at the 3'UTR. Most mRNA in NPC seems to have eIF3 binding at the 3'UTR and only a few at the 5'end where it's commonly thought to bind. In a previous study from the Cate lab, eIF3 was reported to bind to a small region of the 3'UTR of the TCRA and TCRB mRNA, which was responsible for their specific translational stimulation, during T cell activation. Surprisingly in this study, the eIF3 association with mRNA occurs near polyadenylation signals in NPC, independently of cell differentiation status. This compelling evidence suggests a general mechanism of translation control by eIF3 in NPC. This observation brings back the old concept of mRNA circularization with new arguments, independent of PABP and eIF4G interaction. Finally, the discussion adequately describes the potential technical limitations of the present study compared to previous ones by the same group, due to the use of Quick-irCLIP as opposed to the PAR-CLIP/thiouridine.

      Weaknesses:

      (1) These data were obtained from an unusual cell type, limiting the generalizability of the model.

      We agree that unraveling the mechanism employed by eIF3 at the mRNA 3’-UTR termini might be better studied in a stable cell line rather than in primary cells.

      (2) This study lacks a clear explanation for the increased translation associated with NPC differentiation, as eIF3 binding is observed in both differentiated and undifferentiated NPC. For example, I find a kind of inconsistency between changes in Riboseq density (Figure 3B) and changes in protein synthesis (Figure 1D). Thus, the title overstates a modest correlation between eIF3 binding and important changes in protein synthesis.

      We thank the reviewer for this question. Riboseq data and RNASeq data are not on absolute scales when comparing across cell conditions. They are normalized internally, so increases in for example RPF in Figure 3B are relative to the bulk RPF in a given condition. By contrast, the changes in protein synthesis measured in Figure 1D is closer to an absolute measure of protein synthesis.

      (3) This is illustrated by the candidate selection that supports this demonstration. Looking at Figure 3B, ID2, and SNAT2 mRNA are not part of the High TE transcripts (in red). In contrast, the increase in mRNA abundance could explain a proportionally increased association with eIF3 as well as with ribosomes. The example of increased protein abundance of these best candidates is overall weak and uncertain.

      We agree that using TE as the criterion for defining increased eIF3 association would not be correct. By “highly translated” we only mean to convey the extent of protein synthesis, i.e. increases in ribosome protected fragments (RPF), rather than the translational efficiency.

      (4) Despite several attempts (chemical and UV cross-linking) to identify eIF3 partners in NPC such as PABP, PAIP1, or proteins from the 40S, the authors could not provide any evidence for such a mechanism consistent with the closed-loop model. Overall, this rather descriptive study lacks mechanistic insight (eIF3 binding partners).

      We agree that it will be important to identify the molecular mechanism used by eIF3 to engage the termini of mRNA 3’-UTRs. Nevertheless, the identification of eIF3 crosslinking to that location in mRNAs is new, and we think will stimulate new experiments in the field.

      (5) Finally, the authors suspect a potential impact of technical improvement provided by Quick-irCLIP, that could have been addressed rather than discussed.

      We agree a side-by-side comparison of eIF3 crosslinks captured by PAR-CLIP versus Quick-irCLIP will be an important experiment to do. However, NPCs or other primary cells may not be the best system for the comparison. We think using an established cell line might be more informative, to control for effects such as 4-thiouridine toxicity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work sets out to elucidate mechanistic intricacies in inflammatory responses in pneumonia in the context of the aging process (Terc deficiency - telomerase functionality).

      Strengths:

      Very interesting, conceptually speaking, approach that is by all means worth pursuing. An overall proper approach to the posited aim.

      We want to thank the reviewer for taking the time to review our manuscript and for providing positive feedback regarding our research question.  

      Weaknesses:

      The work is heavily underpowered and may have statistical deficits. This precludes it in its current state from drawing unequivocal conclusions.

      Thank you for this essential and valuable comment. We fully accept that the small sample size of the Tercko/ko mice is a major limitation of our study and transparently discuss this in our manuscript.  However, due to Animal Welfare regulations, only a reduced number of mice were approved because of the strong burden of disease. Consequently, only three non-infected and five infected mice were available to us. This reduced number of mice presents a clear limitation to our study. However, due to ethical considerations related to animal welfare and sustainability, as well as compliance with German animal welfare regulations, it is not possible to obtain additional Tercko/ko mice to increase the dataset.

      The animal studies are an important aspect of our study; however, our hypothesis was also investigated at multiple levels, including in an in vitro co-culture model (Figure 5), to ensure comprehensive analysis. Thus, we clearly demonstrated that S. aureus pneumonia in Tercko/ko mice leads to a more severe phenotype, orchestrated by the dysregulation of both innate and adaptive immune response.

      Reviewer #2 (Public Review):

      Summary:

      The authors demonstrate heightened susceptibility of Terc-KO mice to S. aureus-induced pneumonia, perform gene expression analysis from the infected lungs, find an elevated inflammatory (NLRP3) signature in some Terc-KO but not control mice, and some reduction in T cell signatures. Based on that, They conclude that disregulated inflammation and T-cell dysfunction play a major role in these phenomena.

      Strengths:

      The strengths of the work include a problem not previously addressed (the role of the Terc component of the telomerase complex) in certain aspects of resistance to bacterial infection and innate (and maybe adaptive) immune function.

      We would like to thank the reviewer for the positive feedback regarding our aim to investigate the impact of Terc deletion on the pulmonary immune response to S. aureus.  

      Weaknesses:

      The weaknesses outweigh the strengths, dominantly because conclusions are plagued by flaws in experimental design, by lack of rigorous controls, and by incomplete and inadequate approaches to testing immune function. These weaknesses are as follows

      (1) Terc-KO mice are a genomic knockout model, and therefore the authors need to carefully consider the impact of this KO on a wide range of tissues. This, however, is not the case. There are no attempts to perform cell transfers or use irradiation chimera or crosses that would be informative.

      We thank the reviewer for bringing up this important point. The aim of our study, however; was to investigate the impact of Terc deletion in the lung and on the response to bacterial pneumonia, rather than to provide a comprehensive characterization of the Tercko/ko model itself. This characterization of different tissues and cell types has already been conducted by previous studies. For instance, studies that characterize the general phenotype of the model (Herrera et al., 1999; Lee et al., 1998; Rudolph et al., 1999) but also investigations that shed light on the impact of Terc deletion on specific cell types such as microglia (Khan et al., 2015) or T cells (Matthe et al., 2022). The impact of Terc deletion on T cells is also discussed in our manuscript in lines 89 to 105. Furthermore, a section about the general phenotype of the Terc deletion model is included in the introduction in lines 126 to 138. Thus we discussed the relevant literature regarding Tercko/ko mice in our manuscript and attempted to provide a more in-depth characterization of the lung by investigating the inflammatory response to infection as well as changes in the gene expression (Figure 2-4).  

      (2) Throughout the manuscript the authors invoke the role of telomere shortening in aging, and according to them, their Terc-KO mice should be one potential model for aging. Yet the authors consistently describe major differences between young Terc-KO and naturally aging old mice, with no discussion of the implications. This further confuses the biological significance of this work as presented.

      Thank you for mentioning this relevant point. We want to apologize for the confusion regarding this matter. While Tercko/ko mice are a well-established model for premature aging, these effects become more apparent with increasing generations (G) and thus, G5 and 6 mice are the most affected by Terc deletion (Lee et al., 1998; Wong et al., 2008).

      Thus, while Tercko/ko mice are a common model for premature aging, this accelerated aging phenotype is predominantly apparent in later-generation Tercko/ko (G5 and 6) or aged Tercko/ko mice (Lee et al., 1998; Wong et al., 2008). Since the aim of this study was to analyze the impact of Terc deletion on the lung and its immune response to bacterial infections instead of the impact of telomere shortening and telomerase dysfunction, young G3 Tercko/ko mice (8 weeks) were used in this study. This is also mentioned in the lines 131-134. In this study, Tercko/ko mice were used not as a model of aging, but rather as a model specifically for Terc deletion. The old WT mice function as a control cohort to observe possible common but also deviating effects between aging and Terc deletion. In our sequencing data, we observe that uninfected young WT mice are very similar to uninfected Tercko/ko mice. Other studies have also reported this lack of major differences between uninfected WT and Tercko/ko mice in the G3 knockout mice (Kang et al., 2018). Conversely, uninfected young WT and Tercko/ko mice exhibited great differences, for instance, regarding the numbers of differentially expressed genes (Supplemental Figure 1H). Thus, differences between naturally aged mice and young G3 Tercko/ko mice are not surprising. To clarify this aspect we reconstructed the paragraph discussing the Tercko/ko mice (lines 126-134). Additionally we added a paragraph explaining the purpose of the naturally aged mice to the lines 134 to 138:

      “As control cohort age-matched young WT mice were utilized. To investigate whether Terc deletion, beyond critical telomere shortening, impacts the pulmonary immune response, we used young Tercko/ko mice. Additionally, naturally aged mice (2 years old) were infected to explore the potential link to a fully developed aging phenotype.”

      (3) Related to #2, group design for comparisons lacks a clear rationale. The authors stipulate that TercKO will mimic natural aging, but in fact, the only significant differences seen between groups in susceptibility to S. aureus are, contrary to the authors' expectation, between young Terc-KO and naturally old mice (Figures 1A and B, no difference between young Terc-KO and young wt); or there are no significant differences at all between groups (Figures 1, C, D,).

      We thank the reviewer for this essential comment. As mentioned above the Tercko/ko mice in this study are not selected to model natural aging. To model telomerase dysfunction and accelerated aging selection of later generation or aged Tercko/ko mice would have been more suitable. 

      The lack of statistical significance in some figures is likely due to the heterogeneity of disease phenotype of S. aureus infection in mice, which is a limitation of our study that we discuss in our discussion section in lines 576-582. The phenotype of S. aureus infection can vary greatly within a mouse population, highlighting the limitations of mice as a model for S. aureus infections. To account for this heterogeneity we divided the infected Tercko/ko mice cohort into different degrees of severity based on the clinical score and the presence of bacteria in organs other than the lung (mice with systemic infection). 

      Despite the heterogeneity especially within the Tercko/ko mice cohort the differences between the knockout and young as well as old WT mice were striking. Including the fatal infections, 80% of the Tercko/ko mice had a severe course of disease, while none of the WT mice displayed a severe course (Figure 1A, B and Supplemental Figure 1A, B). This hints towards a clear role of Terc in the response to S. aureus infection in mice. Thus while in some figures the differences are not significant, strong trends towards a more severe phenotype of S. aureus infection in the Tercko/ko mice regarding bacterial load, score and inflammatory response could be observed in our study. 

      Another example of inadequate group design is when the authors begin dividing their Terc-KO groups by clinical score into animals with or without "systemic infection" (the condition where a bacterium spreads uncontrollably across the many organs and via blood, which should be properly called sepsis), and then compare this sepsis group to other groups (Supplementary Figures 1G; Figure 2; lines 374-376 and 389391). This gives them significant differences in several figures, but because they did not clearly indicate where they applied this stratification in the figure legends, the data are somewhat confusing. Most importantly, methodologically it is highly inappropriate to compare one mouse with sepsis to another one without. If Terc-KO mice with sepsis are a comparator group, then their controls have to be wild-type mice with sepsis, who are dealing with the same high bacterial load across the body and are presumably forced to deploy the same set of immune defenses.

      We sincerely appreciate the significant time and effort you have invested in reviewing our manuscript. However, with all due respect, we must point out that the definition of sepsis you have referenced is considered outdated. According to the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3), sepsis is defined as "a life-threatening organ dysfunction caused by a dysregulated host response to infection" (Marvin Singer, 2016, JAMA). Given this fundamental misunderstanding of our findings, we find the comment regarding the inadequacy of our groups to be both dismissive and lacking in scientific merit. We would like to emphasize that the group size used in our study is consistent with accepted standards in infection research. We strongly reject any insinuations of inadequacy that have been repeatedly mentioned throughout the review.

      In order to provide a nuanced investigation of disease severity in Tercko/ko mice, we added the term “systemic infection” to the figures whenever the mice were divided into groups of mice with and without systemic infection. This is the case for Figure 2A and Supplemental Figure 1C-E. The division into mice with and without systemic infection is also mentioned in the figure legend of Figure 2A in lines 932 to 935 and for Supplemental Figure 1 in lines 1052-1053. We agree that Supplemental Figure 1G is somewhat confusing as the mice with systemic infection are highlighted in this graph but not included as a separate group within our sequencing analysis. We added a sentence to the figure legend clarifying this (lines 1042-1044):

      “Nevertheless, the infected Tercko/ko mice were considered one group for the expression analysis and not split into separate groups for the subsequent analysis.”

      Additionally, we revised the section regarding this grouping in different degrees of severity in our Material and Methods section to clarify that this division was only performed for specific analysis (line 191):

      “…for the indicated analysis.”

      Furthermore, the mice which were classified as systemically infected mice were not septic mice, as mentioned above. Those mice were classified by us as systemically infected based on their clinical score and the presence of bacteria in other organs than the lung as stated in the lines 188-191 and 377-381. Bacteremia is a symptom of very severe cases of hospital-acquired pneumonia with a very high mortality (De la Calle et al., 2016).

      Therefore, the systemically infected mice or rather mice with bacteremia display an especially severe pneumonia phenotype, which is distinct from sepsis. The presence of this symptom in our Tercko/ko mice further highlights the clinical relevance of our study. This aspect was added to the manuscript in the lines 568-570.

      “The detection of bacteria in extra pulmonary organs is of particular interest, as bacteremia is a symptom of severe pneumonia and is associated with high mortality (De la Calle et al., 2016).”

      (4) The authors conclude that disregulated inflammation and T-cell dysfunction play a major role in S. aureus susceptibility. This may or may not be an important observation, because many KO mice are abnormal for a variety of reasons, and until such reasons are mechanistically dissected, the physiological importance of the observation will remain unclear.

      Two points are important here. First, there is no natural counterpart to a Terc-KO, which is a complete loss of a key non-enzymatic component of the telomerase complex starting in utero. 

      Second, the authors truly did not examine the key basic features of their model, including the features of basic and induced inflammatory and immune responses. This analysis could be done either using model antigens in adjuvants, defined innate immune stimuli (e.g. TLR, RLR, or NLR agonists), or microbial challenge. The only data provided along these lines are the baseline frequencies of total T cells in the spleen of the three groups of mice examined (not statistically significant, Figure 4B). We do not know if the composition of naïve to memory T cell subsets may have been different, and more importantly, we have no data to evaluate whether recruitment of the immune response (including T cells) to the lung upon microbial challenge is similar or different. So, what are the numbers and percentages of T cells and alveolar macrophages in the lung following S. aureus challenge and are they even comparable or are there issues in mobilizing the T cell response to the site of infection? If, for example, Terc-KO mice do not mobilize enough T cells to the lung during infection, that would explain the paucity in many T-cellassociated genes in their transcriptomic set that the authors report. That in turn may not mean dysfunction of T cells but potentially a whole different set of defects in coordinating the response in Terc-KO mice.

      We thank the reviewer for highlighting these important aspects. Regarding the first point, indeed there is no naturally occurring deletion of Terc in humans. However, studies reported reduced expression of Terc and Tert in the tissues of aged mice and rats (Tarry-Adkins et al., 2021; Zhang et al., 2018). Terc itself has been found to have several important immunomodulatory functions such as the activation of the NFκB or PI3-kinase pathway (Liu et al., 2019; Wu et al., 2022). As those aforementioned pathways are relevant for the immune response to S. aureus infections, the authors were interested in exploring the impact of Terc deletion on the pulmonary immune response. The potential immunomodulatory functions of Terc are discussed in lines 106-121. To further clarify our rationale we added a sentence to the introduction in lines 121-125.

      “Interestingly, downregulation of Terc and Tert expression in tissues of aged mice and rats has been found (Tarry-Adkins, Aiken, Dearden, Fernandez-Twinn, & Ozanne, 2021; Zhang et al., 2018). Therefore, as a potential immunomodulatory factor reduced Terc expression could be connected to agerelated pathologies.”

      Regarding the second point, as we focused on the effect of Terc deletion in the lung and its role in S. aureus infection, we investigated inflammatory and immune response parameters relevant to this setting. For instance, inflammation parameters in the lungs of all three mice cohorts were measured to investigate differences in the inflammatory response in the non-infected and infected mice (Figure 2A). Those measurements showed no baseline difference in key inflammatory parameters between young WT and Tercko/ko mice, which is consistent with previous findings (Kang et al., 2018). The inflammatory response to infection with S. aureus in the Tercko/ko mice cohort differed significantly from the other cohorts (Figure 2A), hinting towards a dysregulated inflammatory response due to Terc deletion. Furthermore, we investigated general immune cell frequencies such as dendritic cells, macrophages, and B cells in the spleen of all three mice cohorts to gather a baseline understanding of the general immune cell populations. In our manuscript only total T cell frequencies were included due to its relevance for our data regarding T cells (Figure 4B). This data could show that there was no difference of total amount of T cells in the spleen of all three mice cohorts. For a more detailed insight into our analysis we added the frequencies of the other immune cell populations analyzed in the spleen as a Supplemental Figure 3B-F. Additionally, a figure legend for the graphs was added to lines 1075-1094.

      Therefore, while we did not analyze baseline frequencies of specific populations of T cells, we analyzed and characterized the inflammatory and immune response of our model in a way relevant to our research question. 

      The differences observed in T cell marker and TCR gene expression was also partly present between the uninfected and infected Tercko/ko mice such as the complete absence of CD247 expression in infected Tercko/ko, which is however expressed in uninfected mice of this cohort (Figure 4A, C and D). Thus, this effect cannot be solely attributed to an inadequate mobilization of T cells to the lung after infectious challenge. However, we agree that a more detailed insight into recruited immune cells to the lung or frequencies of different T cell populations could contribute to a better understanding of the proposed mechanism and would be an interesting experiment to conduct in further studies. We accept this as a limitation of our study and included it in our discussion section in lines 719-723:

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      (5) Related to that, immunological analysis is also inadequate. First, the authors pull signatures from the total lung tissue, which is both imprecise and potentially skewed by differences, not in gene expression but in types of cells present and/or their abundance, a feature known to be affected by aging and perhaps by Terc deficiency during infection. Second, to draw any conclusions about immune responses, the authors would have to track antigen-specific T cells, which is possible for a wide range of microbial pathogens using peptide-MHC multimers. This would allow highly precise analysis of phenomena the authors are trying to conclude about. Moreover, it would allow them to confirm their gene expression data in populations of physiological interest

      We thank the reviewer for highlighting this important and relevant point. In our study, we aimed to investigate the role of Terc expression in modulating inflammation and the immune response to S. aureus infection in the lung. To address this, we examined the overall impact of age, genotype, and infection on lung inflammation and gene expression. Therefore, sequencing of total lung tissue was essential for addressing the research question posed. Our findings demonstrate that Tercko/ko mice exhibit a more severe phenotype following S. aureus infection, characterized by an increased bacterial load and heightened lung inflammation (Figures 1 and 2). Furthermore, our data suggest that Terc plays a role in regulating inflammation through activation of the NLRP3 inflammasome, along with the dysregulation of several T cell marker genes (Figures 2, 4, and 5). However, this study lacks a detailed analysis of distinct T cell populations, including antigen-specific T cells, as noted earlier. Investigating these aspects in future studies would be valuable to validate and expand upon our findings. We have incorporated these suggestions into the discussion section (lines 719-723)

      “As total CD4+ T cells were analyzed in this study, it would be useful to investigate specific T cell populations such as memory and effector T cells to elucidate the potential mechanism leading to T cell dysfunctionality in further detail. Additionally, analysis of differences in immune cell recruitment to the lungs between young WT and Tercko/ko mice would be relevant.”

      Nevertheless, our study provides first evidence of a potential connection between T cell functionality and Terc expression.  

      Third, the authors co-incubate AM and T cells with S. aureus. There is no information here about the phenotype of T cells used. Were they naïve, and how many S. aureus-specific T cells did they contain? Or were they a mix of different cell types, which we know will change with aging (fewer naïve and many more memory cells of different flavors), and maybe even with a Terc-KO? Naïve T cells do not interact with AM; only effector and memory cells would be able to do so, once they have been primed by contact with dendritic cells bringing antigen into the lymphoid tissues, so it is unclear what the authors are modeling here. Mature primed effector T cells would go to the lung and would interact with AM, but it is almost certain that the authors did not generate these cells for their experiment (or at least nothing like that was described in the methods or the text).

      Thank you for bringing up this important question. For the co-cultivation experiment of T cells and alveolar macrophages, total CD4+ T cells of both young WT and Tercko/ko were used. We did not select for a specific population of T cells. Our sequencing data indicated the complete downregulation of CD247 expression, which is an important part of the T cell receptor, in the lungs of infected Tercko/ko mice (Figure 4A, C and D). Given that this factor is downregulated under chronic inflammatory conditions, we investigated the impact of the inflammatory response in alveolar macrophages on the expression of various T cell-derived cytokines, as well as CD247 expression (Figure 5D, E) (Dexiu et al., 2022). This aspect is also highlighted in the discussion in lines 622-636. Therefore, a co-cultivation model of T cells and alveolar macrophages was established and confronted with heat-killed S. aureus to elicit an inflammatory response of the macrophages. To emphasize this purpose, we have revised our statement about the model setup in lines 516-518 of the manuscript: 

      “An overactive inflammatory response could be a potential explanation for the dysregulated TCR signaling.”

      The authors hope this will clarify the intent behind the model setup.

      (6) Overall, the authors began to address the role of Terc in bacterial susceptibility, but to what extent that specifically involves inflammation and macrophages, T cell immunity, or aging remains unclear at present.

      We thank the reviewer for the helpful and relevant comments. The authors accept the limitations of the presented study such as the reduced number of Tercko/ko mice and the limitations of murine models for S. aureus infection itself and discuss those in the discussion section in the lines 558-560; 576-582; 688-690 and 719-725. However, we hope that our responses have provided sufficient evidence to convince the reviewer that our data supports a clear role for Terc expression in regulating the immune response to bacterial infections, particularly with respect to inflammation and its potential connection to T cell functionality.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The good element first:

      I read this paper with genuine interest and applaud the authors for investigating the posited question. I consider it by all means scientifically relevant in the context of physiological/pathophysiological aging and reaction to a disease (here pneumonia). The Terc deletion model looks very appropriate for the question and the methodology is very advanced/in-depth. The data flow/selection of endpoints and assays is very logical to me. Moreover, I like the breakdown of pneumonia into varying levels of severity.

      We thank the reviewer for their time and effort taken to revise our manuscript. Additionally, we are grateful to receive your positive feedback regarding our study design and research question.

      The weaknesses:

      (1) I cannot help but notice that the study is heavily underpowered. As such, it is inadmissible. The key reason is that it is the first of its kind and seminal findings must be strongly propped by the evidence. It is apparent to me that the data scatter presented in the figures tends to be abnormally distributed (e.g. obvious bimodal distribution in some groups). Therefore, the presented comparisons (even if stat. sign) can be heavily misleading in terms of: i) the true magnitude of the observed effects and ii) possible type 2 error in some cases of p value >0.05. Solution: repeat the study to ensure reasonable power/reliability. This will also make it stronger as it will immediately demonstrate its reproducibility (or lack of it).

      Thank you for bringing up this extremely relevant point. We acknowledge the issue of the small sample size of Tercko/ko mice as a major limitation of our study. This limitation is also included in our discussion section in the lines 558-560. Thus we fully agree with this limitation and transparently discuss this in our manuscript. However, due to the strict German animal welfare regulations it is not possible to obtain more Tercko/ko mice, as mentioned above. Furthermore, since fatal infections occurred in the Tercko/ko mice cohort we had a reduced number of mice available. 

      However, the differences between the Tercko/ko and WT mice were striking. Including the fatal infections 80% of the Tercko/ko mice had a severe course of disease, while none of the WT mice displayed a severe course. This hints towards a clear role of Terc in the response to S. aureus infection in mice.  

      (2) In the stat analysis section of M&Ms, the authors feature only 1 sentence. This cannot be. A detailed stats workup needs to be included there. This is very much related to the above weakness; e.g. it is impossible to test for normality (to choose an appropriate post-hoc test) with n=3. Back to square one: study underpowered.

      We thank the reviewer for highlighting this important aspect. We carefully revised the method section in lines 357-360 to include all relevant information: 

      “Data are presented as mean ± SD, or as median with interquartile range for violin and box plots, with up to four levels of statistical significance indicated. P-values were calculated using Kruskal-Wallis test. Individual replicates are represented as single data points.”

      (3) Pneumonia severity. While I noted that as a strength, I also note it as weakness here. It looks to me like the authors stopped halfway with this. I totally support testing a biological effect(s) such as the one investigated here across a spectrum of a given disease severity. The authors mention that they had various severity phenotypes produced in their model but this is not visible in the data figs. I strongly suggest including that as well; i.e., to study the posited question in the severe and mild pneumonia phenotype. This is a very smart path and previous preclinical research clearly demonstrated that this severe/mild distinction is very relevant in the context of the observed responses (their presence/absence, longevity, dynamics, etc). I realize this is challenging, thus, I would probably use this approach in the Terc k/o model as sort of a calibrator to see whether the exacerbation observed in the current setup (severe?) will be also present in a mild pneumonia phenotype. S. aureus can be effectively titrated to produce pneumonia of varying severity.

      We thank the reviewer for bringing up this relevant point. 

      In our study, we could observe heterogeneity within the infected Tercko/ko cohort. Therefore as pointed out by the reviewer we assigned different degrees of severity to those groups based on clinical scores, the fatal outcome of the disease (fatal subgroup), and the presence of bacteria in organs other than the lungs (systemic infection subgroup) as stated in our materials and methods part in the lines 188-191 (Supplemental Figure 1A and B). Moreover, we highlighted this difference in a number of our figures. For example, when categorizing the mice into groups with and without systemic infection, we noticed that the mice with systemic infection demonstrated a higher bacterial load, significant body weight loss, and increased lung weight (see Supplemental Figure 1C-E). Interestingly, the two mice with systemic infection clustered separately from the other mice, indicating that the mice with systemic infection are transcriptomically distinct from the other mice cohorts (Supplemental Figure 1G). Additionally, the inflammatory response was exclusively elevated in the lungs of mice with systemic infection (Figure 2C). Thus, we included this distinction in several figures and attempted to study the differences between those subgroups but also their similarities. For instance, we could observe that some changes in the transcriptome were present in all three infected Tercko/ko mice such as the complete absence of CD247 expression at 24 hpi (Figure 4D). This distinction therefore provided a more detailed insight into the underlying mechanisms of disease severity in Tercko/ko mice and is lacking in other studies. We agree with the reviewer, that a study investigating mild and severe pneumonia phenotypes would be clinically relevant. However, as noted above, due to ethical considerations related to animal welfare and sustainability, as well as compliance with German animal welfare regulations, it is not possible to obtain additional Tercko/ko mice to carry out the proposed experiment. 

      (4) Please read ARRIVE guidelines and note the relevant info in M&Ms as ARRIVE guidelines point out.

      Thank you for emphasizing this crucial aspect. We revised our materials and methods section according to the ARRIVE guidelines (lines 179-206).

      “Tercko/ko mice aged 8 weeks, were used for infection studies (n = 8; non-infected = 3; infected = 5). Female young WT (age 8 weeks) and old WT (age 24 months) C57Bl/6 mice (both n = 10; non-infected = 5; infected = 5) were purchased from Janvier Labs (Le Genest-Saint-Isle, France). All infected mouse cohorts were compared to their respective non-infected controls, as well as to the infected groups from other cohorts. Additionally, comparisons were made between the non-infected cohorts across all groups.

      All mice were anesthetized with 2% isoflurane before intranasal infection with S. aureus USA300 (1x108 CFU/20µl) per mouse. After 24 hours, the mice were weighed and scored as previously described (Hornung et al., 2023). Infected Tercko/ko mice were grouped into different degrees of severity based on their clinical score, fatal outcome of the disease (fatal) and the presence of bacteria in organs other than the lung (systemic infection) for the indicated analysis. Mice with fatal infections were excluded from subsequent analyses, with only their final scores being reported. The mice were sacrificed via injection of an overdose of xylazine/ketamine and bleeding of axillary artery after 24 hpi. BAL was collected by instillation and subsequent retrieval of PBS into the lungs. Serum and organs were collected. Bacterial load in the BAL, kidney and liver was determined by plating of serially diluted sample as described above. For this organs were previously homogenized in the appropriate volume of PBS. Gene expression was analyzed in the right superior lung lobe. Lobes were therefore homogenized in the appropriate amount of TriZol LS reagent (Thermo Fisher Scientific, Waltham, MA, US) prior to RNA extraction. The left lung lobe was embedded into Tissue Tek O.C.T. (science services, Munich, Germany) and stored at 80°C until further processing for histological analysis. Cytokine measurements were performed using the right inferior lung lobe. Lobes were previously homogenized in the appropriate volume of PBS. Remaining organs were stored at -80°C until further usage. Mouse studies were conducted without the use of randomization or blinding.“

      (5) There are also some other descriptive deficits but they are of a much smaller caliber so I do not list them.

      We thank the reviewer for their valuable and insightful suggestions for improving our manuscript. We hope that our responses and the corresponding revisions address these suggestions satisfactorily.

      Concluding: the investigative idea is great/interesting and the methodological flow is adequate but the low power makes this study of low reliability in its current form. I strongly urge the authors to walk the extra mile with this work to make it comprehensive and reliable. Best of luck!

      Reviewer #2 (Recommendations For The Authors):

      (1) Many legends are uninformative and do not contain critical information about the experiments. For example, Figure 2A with cytokine measurements (in lung homogenates?) is likely showing data from an ELISA or Luminex test, but there is no mention of that in the legend. It stands next to Figure 2B, which is a gene expression map, again, likely from the lung (prepared how, normalized how, etc?) lacking even the most basic information. Further, Figure 2D has no information on the meaning/effect size of gene ratios on the x-axis. Figures 3 and 4 are presumably the subsets of their transcriptome data set (whole lung, harvested on d ?? post-infection), but that is just a guess on my part. Even in the main text, the timing and the controls for the transcriptomic study are not stated (ln. 398 and onwards). The authors really need to revise the figure legends and provide all the details that an average reader would need to be able to interpret the data.

      We thank the reviewer for bringing up this important point. The figure legends of all figures including supplemental figures were revised to ensure they include all relevant data necessary for accurate interpretation of the graphs. Additionally, we clarified the sequenced samples in lines 427-429:

      “We performed mRNA sequencing of the murine lung tissue of infected and non-infected mice at 24 hpi to elucidate potential differentially expressed genes that contribute to the more severe illness of Tercko/ko mice.”

      (2) Telomere shortening affects differentially different cells and its role in aging is nuanced - different in mesenchymal cells with no telomerase induction, in non-replicating cells, and in hematopoietic cells that can readily induce telomerase. The authors should be mindful of that in setting up their introduction and discussion.

      Thank you for mentioning this essential aspect. We revised our introduction and discussion to reflect the nuanced role of telomerase shortening in different tissues (lines 83-92 and 690-695):

      “Telomerase activity is restricted to specific tissues and cell types, largely dependent on the expression of Tert. While Tert is highly expressed in stem cells, progenitor cells, and germline cells, its expression is minimal in most differentiated cells (Chakravarti, LaBella, & DePinho, 2021). Consequently, the impact of telomerase dysfunction on tissues varies according to their self-renewal rate. (Chakravarti et al., 2021). One important aspect of telomere dysfunction is the impact of telomere shortening on the immune system as well as the hematopoietic system. Tissues or organ systems that are highly replicative, such as the skin or the hematopoietic system, are affected first by telomere shortening (Chakravarti et al., 2021).”

      “It is important to note that telomere shortening has a significant impact on the immune system. Although young Tercko/ko mice were used in this study, telomere shortening is still likely to be a contributing factor. Therefore, further experiments investigating the role of T cell senescence in this model should therefore be conducted.”

      (3) Syntax and formulations need to be improved and made more scientifically precise in several spots. Specifically, in 62-63, the authors say that the aged immune system "is also discussed to be more irritable", please change to reflect the common notion that the reaction to infection is dysregulated; in many cases inflammation itself is initially blunted, misdirected, and of different type (e.g. for viruses, the key IFN-I responses are not increased but decreased). In lines 114-117, presumably, the two sentences were supposed to be connected by a comma, although some editing for clarity is probably needed regardless. Line 252, please change "unspecific" to "non-specific". Line 264, please capitalize German.

      We thank the reviewer for bringing these important points to our attention. We revised our introduction regarding the aged immune response in lines 61-69:

      “Age-related dysregulation of the immune response is also characterized by inflammaging, defined as the presence of elevated levels of pro-inflammatory cytokines in the absence of an obvious inflammatory trigger (Franceschi et al., 2000; Mogilenko, Shchukina, & Artyomov, 2022). Additionally, immune cells, such as macrophages, exhibit an activated state that alters their response to infection (Canan et al., 2014). In contrast, the immune response of macrophages to infectious challenges has been shown to be initially impaired in aged mice (Boe, Boule, & Kovacs, 2017). Thus aging is a relevant factor impacting the pulmonary immune response.”

      Sentences were edited to provide more clarity in lines 131-134:

      “Although G3 Tercko/ko mice with shortened telomeres were used in this study, they were infected at a young age (8 weeks). This approach allowed for the investigation of Terc deletion effects rather than telomere dysfunction.”

      “Unspecific was changed to “non-specific” in line 282 and “German” was capitalized in line 293 and 558.

      We appreciate and thank you for your time spent processing this manuscript and look forward to your response.

      References

      De la Calle, C., Morata, L., Cobos-Trigueros, N., Martinez, J. A., Cardozo, C., Mensa, J., & Soriano, A. (2016). Staphylococcus aureus bacteremic pneumonia. European Journal of Clinical Microbiology & Infectious Diseases, 35(3), 497-502. https://doi.org/10.1007/s10096-015-2566-8  

      Dexiu, C., Xianying, L., Yingchun, H., & Jiafu, L. (2022). Advances in CD247. Scand J Immunol, 96(1), e13170. https://doi.org/10.1111/sji.13170  

      Herrera, E., Samper, E., Martín-Caballero, J., Flores, J. M., Lee, H. W., & Blasco, M. A. (1999). Disease

      states associated with telomerase deficiency appear earlier in mice with short telomeres. Embo j, 18(11), 2950-2960. https://doi.org/10.1093/emboj/18.11.2950  

      Hornung, F., Schulz, L., Köse-Vogel, N., Häder, A., Grießhammer, J., Wittschieber, D., Autsch, A., Ehrhardt, C., Mall, G., Löffler, B., & Deinhardt-Emmer, S. (2023). Thoracic adipose tissue contributes to severe virus infection of the lung. International Journal of Obesity, 47(11), 10881099. https://doi.org/10.1038/s41366-023-01362-w  

      Kang, Y., Zhang, H., Zhao, Y., Wang, Y., Wang, W., He, Y., Zhang, W., Zhang, W., Zhu, X., Zhou, Y., Zhang, L., Ju, Z., & Shi, L. (2018). Telomere Dysfunction Disturbs Macrophage Mitochondrial Metabolism and the NLRP3 Inflammasome through the PGC-1α/TNFAIP3 Axis. Cell Reports, 22(13), 3493-3506. https://doi.org/https://doi.org/10.1016/j.celrep.2018.02.071  

      Khan, A. M., Babcock, A. A., Saeed, H., Myhre, C. L., Kassem, M., & Finsen, B. (2015). Telomere dysfunction reduces microglial numbers without fully inducing an aging phenotype. Neurobiology of Aging, 36(6), 2164-2175. https://doi.org/https://doi.org/10.1016/j.neurobiolaging.2015.03.008  

      Lee, H.-W., Blasco, M. A., Gottlieb, G. J., Horner, J. W., Greider, C. W., & DePinho, R. A. (1998). Essential role of mouse telomerase in highly proliferative organs. Nature, 392(6676), 569-574. https://doi.org/10.1038/33345  

      Liu, H., Yang, Y., Ge, Y., Liu, J., & Zhao, Y. (2019). TERC promotes cellular inflammatory response independent of telomerase. Nucleic Acids Research, 47(15), 8084-8095. https://doi.org/10.1093/nar/gkz584  

      Matthe, D. M., Thoma, O. M., Sperka, T., Neurath, M. F., & Waldner, M. J. (2022). Telomerase deficiency reflects age-associated changes in CD4+ T cells. Immun Ageing, 19(1), 16. https://doi.org/10.1186/s12979-022-00273-0  

      Rudolph, K. L., Chang, S., Lee, H. W., Blasco, M., Gottlieb, G. J., Greider, C., & DePinho, R. A. (1999). Longevity, stress response, and cancer in aging telomerase-deficient mice. Cell, 96(5), 701-712. https://doi.org/10.1016/s0092-8674(00)80580-2  

      Tarry-Adkins, J. L., Aiken, C. E., Dearden, L., Fernandez-Twinn, D. S., & Ozanne, S. (2021). Exploring Telomere Dynamics in Aging Male Rat Tissues: Can Tissue-Specific Differences Contribute to Age-Associated Pathologies? Gerontology, 67(2), 233-242. https://doi.org/10.1159/000511608  

      Wong, L. S. M., Oeseburg, H., de Boer, R. A., van Gilst, W. H., van Veldhuisen, D. J., & van der Harst, P. (2008). Telomere biology in cardiovascular disease: the TERC−/− mouse as a model for heart failure and ageing. Cardiovascular Research, 81(2), 244-252. https://doi.org/10.1093/cvr/cvn337  

      Wu, S., Ge, Y., Lin, K., Liu, Q., Zhou, H., Hu, Q., Zhao, Y., He, W., & Ju, Z. (2022). Telomerase RNA TERC and the PI3K-AKT pathway form a positive feedback loop to regulate cell proliferation independent of telomerase activity. Nucleic Acids Res, 50(7), 3764-3776. https://doi.org/10.1093/nar/gkac179  

      Zhang, M. W., Zhao, P., Yung, W. H., Sheng, Y., Ke, Y., & Qian, Z. M. (2018). Tissue iron is negatively correlated with TERC or TERT mRNA expression: A heterochronic parabiosis study in mice. Aging (Albany NY), 10(12), 3834-3850. https://doi.org/10.18632/aging.101676

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1:

      (1) Which allele is alr1, the one upstream of mazEF or the one in the lysine biosynthetic operon?

      Alr1 is encoded by SAUSA300_2027 and is the gene upstream to mazEF. We have now incorporated this information in the manuscript (Line# 127).

      (2) Figure 3B. Where does the C3N2 species come from in the WT and why is it absent in the mutants? It is about 25% of the total dipeptide pool.

      In Figure 3B, C3N2 species results from the combination of C3N1 (from Alr1) and C0N1 (from Dat). The reason this species is completely absent in either of the two mutants is because it requires one D-Ala from both Alr1 and Dat proteins to generate C3N2 D-Ala-D-Ala.

      (3) Figure 3D could perhaps be omitted. I understand that the authors attained statistical significance in the fitness defect, but biologically this difference is very minor. One would have to look at the isotopomer distribution in the Dat overexpressing strain to make sure that increased flux actually occurred since there are other means of affecting activity (e.g. allosteric modulators).

      Thank you for the suggestion. We agree with the reviewer that the fitness defect observed after increased dat expression is relatively minor and have moved this figure to the supplementary section as Figure 3-figure supplement 1.

      Although we attempted to amplify the fitness defect of dat expression by cloning dat on to a multicopy vector, we couldn't maintain its stable expression in S. aureus. This instability may be due to the depletion of D-Ala when dat is overexpressed. As a result, we switched to expressing dat from a single additional copy integrated into the SaPI locus, which was sufficient to cause the expected fitness defect, albeit a minor one.

      (4) In Figure 4A, why is the complete subunit UDP-NAM-AEKAA increasing in each strain upon acetate challenge if there was such a stark reduction in D-Ala-D-Ala, particularly in the ∆alr1 mutant? For that matter, why are the levels of UDP-NAM-AEKAA in the ∆alr1 mutant identical to that of WT with/out acetate?

      Thank you for raising this important point. We have addressed this in line# 299-302 and 451-455 of the revised manuscript. In short, we believe that the inhibition of Ddl by acetate significantly increases the intracellular pool of the tripeptide UDP-NAM-AEK, which then outcompetes the substrate (pentapeptide; UDP-NAM-AEKAA) of MraY. As a result, the intracellular concentration of the pentapeptide increases since it is no longer efficiently consumed by MraY. This explanation is also supported by a kinetic study conducted in Ref (1), where the competition between UDP-NAM-AEKAA and UDP-NAM-AEK as substrates for MraY is demonstrated.

      (5) Figure 4B. Is there no significant difference between ddl and murF transcripts between WT and ∆alr1 under acetate stress? This comparison was not labeled if the tests were done.

      Thank you for suggesting this comparison. The ddl and murF transcripts between WT and alr1 under acetate stress were significantly different. We have added this comparison to Figure 4B.

      (6) Although tricky, it is possible to measure intracellular acetate. It might be of interest to know where in the Ddl inhibition curve the cells actually are.

      Thank you for the suggestion. We agree this would have been an excellent addition to the manuscript. However, accurately measuring intracellular acetate would require the use of radiolabeled acetate (2), and we currently lack the expertise to do this experiment. However, since our study clearly shows that acetate-mediated growth impairment is due to Ddl inhibition, and the IC50 of acetate for Ddl is around 400 mM, we predict that the intracellular concentration must be close to or above this IC50 to observe the growth phenotypes we report.

      Reviewer #2:

      Although the authors have conclusively shown that Ddl is the target of acetic acid, it appears that the acetic acid concentration used in the experiments may not truly reflect the concentration range S. aureus would experience in its environment. Moreover, Ddl is only significantly inhibited at a very high acetate concentration (>400 mM). Thus, additional experiments showing growth phenotypes at lower organic acid concentrations may be beneficial.

      Thank you for the suggestion. In response to the reviewer, we have measured growth at various acetate concentrations and demonstrate a concentration-dependent effect (Figure 1C).

      We use 20 mM acetic acid in our study. In the gut, where S. aureus colonizes, acetate levels can reach up to 100 mM, so we believe our concentrations are physiologically relevant. When S. aureus encounters 20 mM acetate, the intracellular concentration can rise to 600 mM if the transmembrane pH gradient is 1.5 units, which is well above the ~400 mM IC50 we report for Ddl.

      Another aspect not adequately discussed is the presence of D-ala in the gut environment, which may be protective against acetate toxicity based on the model provided.

      Thank you for pointing this out. We agree that D-Ala from the gut microbiota could protect against acetate toxicity, and we’ve included this in the discussion. However, our study clearly indicates that S. aureus itself maintains high intracellular D-Ala levels through Alr1 activity which is sufficient to counter acetate anion intoxication.

      Recommendation for the authors:

      Reviewer #2:

      Major Comments:

      (1) In Line 85, authors indicate S. aureus may encounter a high concentration of ~100 mM acetic acid (extracellular?). Could the authors cite more (and recent) references indicating S. aureus encounters >100 mM acetic acid in the environment?

      To the best of our knowledge, no studies have specifically examined whether S. aureus encounters high mM concentration of acetate in the gut. Line 85 was surmised from multiple studies: recent findings that S. aureus colonizes the gut (3, 4) and that the gut environment has high acetate levels (~100 mM) (5). In response to the reviewers request, more recent references supporting high acetate concentrations in the gut (6, 7) have been added in Line# 86.

      (2) In Line 117, it is mentioned that S. aureus when grown in vitro at 20 mM acetic acid can accumulate ~600 mM acetic acid in the cytoplasm.

      a. Does the intracellular concentration go up proportionally if grown in 100 mM acetic acid? Given the IC50 of acetic acid-mediated inhibition of Ddl is ~400 mM, I wonder how physiologically relevant this finding presented here is.

      Thank you for the opportunity to explain this further. If S. aureus encounters a concentration of 100 mM acetate and its transmembrane pH gradient (pHin-pHout) is held at 1.5, the intracellular concentration of acetate could theoretically increase up to 3 M based on Ref (8). However, previous studies have shown that bacteria can lower the magnitude of transmembrane pH gradient by decreasing their intracellular pH to limit accumulation of anions within cells (9, 10).

      Although our study shows that the IC50 of Ddl inhibition by acetate is relatively high (~400 mM), we believe it’s still relevant because just 20 mM of environmental acetate at a pH of 6.0 can raise the intracellular concentration of acetate to over 600 mM, which is well above the IC50 we report for Ddl. Moreover, since S. aureus may encounter high concentrations of acetate during gut colonization, we believe our findings are physiologically relevant.

      b. Could the authors show concentration-dependent growth inhibition in alr::tn by titrating a range of acetic acid concentrations (for example 0, 0.5, 1, 5, 10, 20 mM)? Measuring intracellular acetate concentration may be beneficial as well.

      Thank you for this question. We now provide data to support that acetate-mediated inhibition of the alr1 mutant is concentration-dependent (see Figure 1C).

      c. It appears that there may be excess D-ala in the gut environment (PMIDs: 30559391; 35816159), which could counter the high acetate based on the model presented here. Could the authors clarify and/or include this information in the manuscript?

      This is an excellent point, and we have now included it in the discussion (Line# 470-475). It is indeed possible that D-Ala produced by the gut microbiome may further enhance S. aureus resistance to organic acid anions, in addition to the inherent contribution of Alr1 activity.

      (3) The following is not needed; however, it would be interesting if the authors could show that S. aureus cells grown in the presence of acetate are highly sensitive to cycloserine (which targets Alr and Ddl) compared to cells grown in the absence of acetate.

      Thank you for the suggestion. We are currently studying D-cycloserine (DCS) resistance in S. aureus. Although we provide the data below for clarification, it is not included in the current manuscript as it is part of a separate study.

      As the reviewer speculated, S. aureus is more susceptible to DCS when grown in the presence of acetate (see figure below). Normally, complete growth inhibition occurs at 32 µg/ml of DCS. However, with 20 mM acetic acid present, complete inhibition is achieved at just 8 µg/ml of DCS. Furthermore, the growth inhibition is completely rescued when externally supplemented with 5 mM D-Ala. We believe that DCS works synergistically with acetate to inhibit Ddl activity, and we are conducting additional studies to explore this further.

      Minor Comments:

      (1) Many commas are missing.

      Missing commas are now incorporated.

      (2) Line 77: disassociate --> dissociate

      Corrected.

      (3) Line 103: that --> which

      Corrected.

      (4) Lines 199-203: authors could have used gfp/luciferase reporter to test their hypotheses.

      Thank you for the suggestion. Initially, we created GFP translational fusions for all the mutants mentioned in Line# 199-203. However, the fluorescence intensity was too low to test the hypothesis, as these were single-copy fusions inserted at the SaPI site of the S. aureus genome. Because of this limitation, we took advantage of the essentiality of D-Ala-D-Ala in S. aureus to report on various mutants instead of a fluorescent reporter. In hindsight, a LacZ reporter assay might have been equally effective.

      (5) Line 339: It would be beneficial to introduce that Ddl has two independent ATP and D-ala binding sites.

      We have now added that information (Line# 338-339).

      (6) Is ddl an essential gene? If so, explicitly mention that.

      Yes, ddl is an essential gene and we have now incorporated this information in Line 103.

      (7) Line 354: shows a difference in density?

      The use of the term “difference density” is a technical crystallographic term commonly used to connote density observed for ligands in X-ray crystal structures. In this case, it simply refers to the observed density that corresponds to the two acetate ions bound within the Ddl active site.

      (8) Line 498: "Thus." Typo, change period to comma.

      We have corrected as suggested in Line 496.

      (9) Figure 1 legend says "was screen" instead of screened.

      This is now corrected.

      (10) Figure 1- Figure Supplement 1B: including data for alr2::tn dat::tn may ensure no redundancy (Lines 171-172). It is currently missing.

      Thank you for the suggestion. We now include both alr2dat double mutant and the alr1alr2dat triple mutant in Figure 1 - Figure Supplement 1B. In addition we also show that the alr1alr2dat mutant is resuced by the addition of D-Ala in Figure 1 - Figure Supplement 1C. The mutant information is also added to Table S5.

      (11) Figure 7: pentaglycine coming off of NAM is misleading. Remove untethered pentaglycine bridges.

      We thank you for pointing this out. We have modified the figure in the manuscript as suggested by the reviewer.

      (12) Are alr1/ddl cells (with limited 4-3 PG crosslink) less sensitive to vancomycin?

      On the contrary, the alr1 mutant is slightly more sensitive to vancomycin compared to the wild-type strain (see Figure below). We believe this happens because the alr1 mutant incorporates less D-Ala-D-Ala into the peptidoglycan, reducing the number of targets for vancomycin. As a result, vancomycin may be able to saturate the available D-Ala-D-Ala targets on the cell wall at a lower concentration in the alr1 mutant than in the wild type strain, leading to increased sensitivity. We haven’t included this data in the manuscript as it is part of a separate study.

      (13) Based on the structural studies, could the authors mutate the residues of Ddl involved in acetic acid binding, thereby making it resistant to acetic acid stress?

      The residues that the acetate anion interacts with are located within the ATP-binding and D-Ala-binding sites of Ddl. Since these residues are essential for Ddl function, we are unable to mutate them.

      (14) Microscopy to show the cell morphologies of wild-type and mutants exposed to acetic acid (and with D-ala supplementation) could be potentially interesting.

      Thank you for the suggestion. We did perform microscopy, expecting changes in cell shape or size, but the results were unremarkable and not included in the manuscript.

      References:

      (1) Hammes WP & Neuhaus FC (1974) On the specificity of phospho-N-acetylmuramyl-pentapeptide translocase. The peptide subunit of uridine diphosphate-N-actylmuramyl-pentapeptide. J Biol Chem 249(10):3140-3150.

      (2) Roe AJ, McLaggan D, Davidson I, O'Byrne C, & Booth IR (1998) Perturbation of anion balance during inhibition of growth of Escherichia coli by weak acids. J Bacteriol 180(4):767-772.

      (3) Acton DS, Plat-Sinnige MJ, van Wamel W, de Groot N, & van Belkum A (2009) Intestinal carriage of Staphylococcus aureus: how does its frequency compare with that of nasal carriage and what is its clinical impact? Eur J Clin Microbiol Infect Dis 28(2):115-127.

      (4) Piewngam P_, et al. (2023) Probiotic for pathogen-specific _Staphylococcus aureus decolonisation in Thailand: a phase 2, double-blind, randomised, placebo-controlled trial. Lancet Microbe 4(2):e75-e83.

      (5) Cummings JH, Pomare EW, Branch WJ, Naylor CP, & Macfarlane GT (1987) Short chain fatty acids in human large intestine, portal, hepatic and venous blood. Gut 28(10):1221-1227.

      (6) Correa-Oliveira R, Fachi JL, Vieira A, Sato FT, & Vinolo MA (2016) Regulation of immune cell function by short-chain fatty acids. Clin Transl Immunology 5(4):e73.

      (7) Hosmer J, McEwan AG, & Kappler U (2024) Bacterial acetate metabolism and its influence on human epithelia. Emerg Top Life Sci 8(1):1-13.

      (8) Carpenter CE & Broadbent JR (2009) External concentration of organic acid anions and pH: key independent variables for studying how organic acids inhibit growth of bacteria in mildly acidic foods. J Food Sci 74(1):R12-15.

      (9) Russell JB (1992) Another explanation for the toxicity of fermentation acids at low pH: anion accumulation versus uncoupling. Journal of Applied Bacteriology 73(5):363-370.

      (10) Russell JB & Diez-Gonzalez F (1998) The effects of fermentation acids on bacterial growth. Adv Microb Physiol 39:205-234.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1: 

      Limitations are that only the cytosolic fragments of the channel were studied, and the current manuscript does not do a good job of placing the results in the context of what is already known about CNBDs from other methods that yield similar information.

      In the revision, we have now added a paragraph in the discussion that addresses why the cytosolic fragment was used and a paragraph putting our results into the context of previous work on CNBD channels where possible. 

      (1) Why do the authors not apply their approach to the full-length channel? A discussion of any limitations that make this difficult would be worthwhile.” Full-length ion channel protein expression is more challenging, and it was important to start with a simpler system. This is now stated in the discussion.

      (2) …nonetheless a comparison of the conformational heterogeneity and energetics obtained from these different approaches would help to place this work in a larger context.

      We have now added a paragraph in the discussion putting our work in a larger context and addressing the challenges of comparing our results to previous studies. 

      (3) Page 5 - 3:1 unlabeled:labeled subunits in mix => 42% of molecules have 3:1 stoichiometry as desired and 21% of molecules have 2:2 stoichiometry!!! (binomial distribution p=0.25, n=4). So 1/3 of molecules with labels have two labeled subunits. This does not seem like it is at all avoiding the problem of intersubunit FRET…

      From the experimental perspective, the 3:1 molar ratio stated is certainly a low estimate of the actual subunit ratios given our FSEC data in Figure 2D and the higher expression of the WT protein compared to labeled protein. Furthermore, even without the addition of any WT protein, the calculated contribution of intersubunit FRET is negligible given that the FRET efficiency is heavily dominated by the closest donor-acceptor distances (Figure 4). 

      (4) Figure 2E - Some monomers appear to still be present in the collected fraction. The authors should discuss any effect this might have on their results.

      We now describe in the text that, at the low concentrations (~10nM) used for mass photometry, a second small peak was observed of ~30kDa, which is below the analytical range for this method. This would not affect our results since all tmFRET experiments used higher protein concentrations to ensure tetramerization.

      (5) page 4 - "Time-resolved tmFRET, therefore, resolves the structure and relative abundance of multiple conformational states in a protein sample." - structure is not resolved, only a single distance.

      We have reworded this sentence.  

      Reviewer #2:

      Regarding cyclic nucleotide-binding domain (CNBD)-containing ion channels, I disagree with the authors when they state that "the precise allosteric mechanism governing channel activation upon ligand binding, particularly the energetic changes within domains, remains poorly understood". On the contrary, I would say that the literature on this subject is rather vast and based on a significantly large variety of methodologies…

      Despite this vast literature on the energetics of CNBD channels there is no consensus about the energetics and coupling of domains that underlies the allosteric mechanism in any CNBD channel. We have added a separate paragraph in the discussion to clarify our meaning.

      In light of the above, I suggest the authors better clarify the contribution/novelty that the present work provides to the state-of-the-art methodology employed (steady-state and time-resolved tmFRET) and of CNBD-containing ion channels…

      …In light of the above, what is the contribution/novelty that the present work provides to the SthK biophysics?

      This work is the first use of the time-resolved tmFRET method to obtain intrinsic G (of an apo conformation) and G values for different ligands. It is also the first application of this approach to SthK or, indeed, to any protein other than MBP. This is mentioned in the introduction.  

      …On the basis of the above-cited work (Evans et al., PNAS, 2020) the authors should clarify why they have decided to work on the isolated Clinker/CNBD fragment and not on the full-length protein…

      We chose to start on the C-terminal fragment to provide a technically more tractable system for validating our approach using time-resolved tmFRET before moving to the more challenging full-length membrane protein. This is now addressed in a new paragraph in the discussion. 

      What is the advantage of using the Clinker/CNBD fragment of a bacterial protein and not one of HCN channels, as already successfully employed by the authors (see above citations)?

      We have chosen to perform these studies in SthK rather than a mammalian CNBD channel as SthK presents a useful model system that allows us to later express fulllength channels in bacteria. In addition, the efficiency of noncanonical amino acid incorporation is much higher in bacteria than in mammalian cells.

      Reviewer #3: 

      While the use of a truncated construct of SthK is justified, it also comes with certain limitations…

      We agree that the truncated channel comes with limitations, but we still think that there is relevant energetic information from studies of the isolated CNBD. This is now addressed in the discussion. 

      I recommend the authors carefully assess their statements on allostery. …The authors also should consider discussing the discrepancies between their truncated construct and full-length channels in more detail.

      We added a paragraph in the introduction that now puts the conformational change of the CNBD in the context of the allosteric mechanism of the full-length channel. We also added a paragraph discussing in more detail the relationship between the energetics of the C-terminal fragment and the full-length channel.  

      Regarding the in silico predictions, it is unclear to me why the authors chose the closed state of SthK Y26F and the 'open' state of the isolated C-linker CNBD construct…

      The active cAMP bound structure (4d7t) was a high resolution X-ray crystallography structure chosen as the only model with a fully resolved C-helix. The resting state structure (7rsh) was selected as a the only resting state to resolve the acceptor residue studied here (V417).     

      Previously it has been shown that SthK (and CNG) goes through multiple states during gating. This may be discussed in more detail, especially when it comes to the simplified four-state model…

      As stated above, we added paragraphs to the introduction and discussion placing the conformational change of the CNBD in the context of the full-length channel.  

      It would be interesting to see how the conformational distribution of the C-helix position integrates with available structural data on SthK. In general, putting the results more into the context of what is known for SthK and CNG channels, could increase the impact.

      We now discuss the relationship between existing structures and energetics in the introduction.  

      This may be semantics, but when working with a truncated construct that is missing the transmembrane domains using 'open' and 'closed' state is questionable. I recommend the authors consider a different nomenclature.

      We refer to the conformational states of the CNBD as ‘resting’ and ‘active’ and used ‘closed’ and ‘open’ only for the conformational states of the pore.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The sample size of the in-house dataset used for training the model was relatively small (34 patients), which might limit the generalizability of the findings.

      (2) The authors did not perform functional experiments to directly validate the roles of the identified key genes in radiotherapy sensitivity, relying instead on associations with immune features and signaling pathways.

      (3) The study did not discuss the potential limitations of using machine learning algorithms, such as the risk of overfitting and the need for larger, diverse datasets for more robust model development and validation.

      (1) Currently, we are actively expanding the dataset by incorporating additional patient samples to enhance the model's robustness and generalizability. Furthermore, we implement advanced statistical techniques, including cross-validation, during model development to mitigate the potential limitations associated with the small sample size on our results. This limitation has been comprehensively addressed in the discussion section of our manuscript.

      (2) Given the current resource limitations, our study predominantly employed bioinformatics analyses. We acknowledge the critical importance of experimental validation and are actively pursuing additional funding and collaborative opportunities to facilitate future experimental studies. Concurrently, we have enhanced the discussion section to comprehensively address the limitations of our approach and emphasize the necessity for future experimental validation.

      (3) We appreciate the reviewers' insightful comments regarding the potential limitations of machine learning algorithms, particularly the risk of overfitting. In response, we have incorporated a comprehensive discussion of these concerns, detailing the measures implemented to mitigate such risks, including the application of regularization techniques and the adoption of more rigorous cross-validation methodologies. We further acknowledge the necessity for larger and more diverse datasets to enhance model validity and generalizability, a concern we intend to address in our future research endeavors. The revised manuscript includes an expanded discussion on these critical points.

      Here is the limitation section in the revised Manuscript:

      “This study primarily focuses on specific subtypes of nasopharyngeal carcinoma (NPC), potentially limiting its direct generalizability to other NPC subtypes or related head and neck malignancies. Furthermore, the limited sample size of our dataset may impact the model's generalizability and extrapolation capabilities. To mitigate the potential limitations associated with the small sample size, we employed advanced statistical methodologies, including cross-validation, to enhance the robustness and reliability of our findings. Nevertheless, we acknowledge the necessity for larger datasets and are actively collaborating with other research institutions to expand our sample size, thereby enhancing the robustness and broader applicability of our findings. Additionally, while our study utilizes bioinformatics approaches to identify and analyze key genes, we recognize that the absence of direct experimental functional validation represents a significant limitation. To address this limitation, we are actively pursuing additional funding and establishing collaborations with specialized laboratories to conduct crucial functional validation experiments, which will further elucidate the specific roles of these genes in radiotherapy response. Moreover, we acknowledge the potential risk of overfitting inherent in the application of machine learning algorithms to biomedical data analysis. To mitigate this risk, we implemented regularization techniques during model development and adopted a rigorous cross-validation strategy for model validation. These methodological approaches aim to ensure that our models maintain robust predictive performance on unseen data. Notwithstanding these limitations, our study offers novel insights into the molecular mechanisms underlying radiotherapy sensitivity in NPC and indicates promising avenues for future investigation. Future research endeavors will prioritize expanding the dataset, conducting comprehensive experimental validation, and refining our predictive model to enhance its accuracy and clinical applicability.”

      Reviewer #2 (Public Review):

      (1) The study focuses on a specific type of nasopharyngeal carcinoma (NPC) and may not be generalizable to other subtypes or related head and neck cancers. The applicability of NPC-RSS to a broader range of patients and tumor types remains to be determined.

      (2) The study does not account for potential differences in radiotherapy protocols, doses, and techniques between the training and validation cohorts, which could influence the performance of the predictive model. Standardization of treatment parameters would be important for future validation studies.

      (3) The binary classification of patients into radiotherapy-sensitive and resistant groups may oversimplify the complex spectrum of treatment responses. A more granular stratification system that captures intermediate responses could provide more nuanced predictions and better guide personalized treatment decisions.

      (4) The study does not address the potential impact of other relevant factors, such as tumor stage, histological subtype, and concurrent chemotherapy, on the predictive performance of NPC-RSS. Incorporating these clinical variables into the model could enhance its accuracy and clinical utility.

      (1) We appreciate the reviewers' interest in the applicability of our study. This study specifically focuses on a particular subtype of nasopharyngeal carcinoma (NPC), which may limit its direct generalizability to other NPC subtypes or related head and neck malignancies. We have incorporated a detailed discussion of this limitation in the Discussion section and intend to investigate the applicability of NPC-RSS across a broader spectrum of tumor types and subtypes in subsequent studies.

      (2) We acknowledge the reviewers' emphasis on the significance of potential variations in radiotherapy regimens, doses, and techniques. In the current study, we did not sufficiently account for these factors, potentially impacting the model's generalizability and accuracy. We aim to improve data consistency and strengthen model validation by standardizing treatment parameters in future investigations.

      (3) We concur with the reviewers' assessment that binary categorization may oversimplify the intricate nature of treatment responses. Indeed, radiotherapy responses likely exist on a continuous spectrum. Consequently, we intend to develop more refined stratification systems to capture intermediate responses, thereby enhancing the accuracy of treatment outcome predictions and facilitating personalized treatment decisions.

      (4) We appreciate the reviewers' recommendation to incorporate clinical variables, including tumor stage, histological subtype, and concurrent chemotherapy, into the model. We acknowledge that these factors are crucial for enhancing the accuracy and clinical applicability of predictive models. We are presently compiling these additional data and intend to integrate these variables into subsequent model iterations.

      Reviewer #1 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more comprehensive comparison of the NPC-RSS with existing prognostic models or biomarkers for nasopharyngeal carcinoma. This would help highlight the unique value and potential superiority of the NPC-RSS in predicting radiotherapy sensitivity.

      2) The authors should consider expanding their discussion on the potential molecular mechanisms underlying the association between the key NPC-RSS genes and radiotherapy response. They could explore whether these genes have been previously implicated in radiotherapy resistance in other cancer types and discuss the potential functional roles of these genes in the context of nasopharyngeal carcinoma.

      (1) We appreciate your thorough review and valuable suggestions concerning our study. In response to the suggestion of comparing the Nasopharyngeal Carcinoma Radiotherapy Sensitivity Score (NPC-RSS) with existing prognostic models or biomarkers, we have carefully considered this proposal and determined that such a comparison is beyond the scope of our current study. The primary focus of our research is on the development and internal validation of the NPC-RSS model's accuracy and reliability. At present, we do not have access to the necessary external data to conduct a valid comparison, and the integration of such data extends beyond the parameters of this study. We intend to incorporate this comparative analysis in future studies to further validate the efficacy and explore the clinical application potential of the NPC-RSS model. We appreciate your understanding and continued support for our research endeavors.(2) In the revised manuscript, we have incorporated a comprehensive review of the functions of these key genes in various cancer types and explored their potential mechanisms of action in nasopharyngeal carcinoma (NPC). Through the citation of pertinent studies, we have elucidated the impact of these genes on radiotherapy sensitivity and resistance. Furthermore, we have proposed future research directions to elucidate the specific roles of these genes in the radiotherapy response of NPC.

      The following are new additions to the revised draft:

      “Previous studies have demonstrated that SMARCA2 significantly influences the radiotherapy response in non-small cell lung cancer (NSCLC). Depletion of SMARCA2 has been shown to enhance radiosensitivity, suggesting its potential as a therapeutic target for radiosensitization [30478150]. Additionally, the DMC1 gene has been incorporated into the radiosensitivity index (RSI) to evaluate radiotherapy sensitivity and prognosis, particularly in endometrial cancers. This inclusion provides valuable insights into the DNA damage repair process [38628740]. Studies on CD9 in glioblastoma multiforme (GBM) have revealed that post-radiotherapy increases in CD9 and CD81 levels in extracellular vesicles (EVs) are strongly correlated with the cytotoxic response to treatment. This finding suggests the potential of CD9 as a novel biomarker for monitoring radiotherapy efficacy [36203458]. In contrast, the association of PSG4 and KNG1 with radiotherapy resistance remains unexplored in the current literature.

      Future research should focus on analyzing the expression patterns of SMARCA2 in NPC patients and its correlation with radiotherapy efficacy using clinical samples. This analysis could elucidate its potential as a target for radiosensitization therapy. Investigating the correlation between DMC1 expression levels and radiotherapy sensitivity in NPC could potentially aid in predicting treatment efficacy and optimizing therapeutic regimens. Furthermore, analysis of extracellular vesicles, particularly those containing CD9, in post-radiotherapy NPC patients could assess their feasibility as biomarkers for monitoring treatment response. These proposed studies would not only contribute to a deeper understanding of the mechanisms underlying the role of these genes in NPC radiotherapy but could also potentially lead to the development of novel strategies for enhancing radiotherapy efficacy.”

      Minor Recommendations:

      (1) It is recommended that the author share the code for the article on Github or a similar open source platform.

      (2) The manuscript would benefit from a thorough review of the punctuation and sentence structure to improve readability and clarity.

      (1) You suggest sharing the code utilized in this study on GitHub or a comparable open-source platform to enhance the transparency and reproducibility of the research. I fully recognize the significance of this suggestion. However, due to the sensitivity of the data involved and the existing intellectual property agreement with my research team, we are unable to make the code publicly available at this time. We are actively seeking a method to safeguard the intellectual property of the project while also planning to share our tools and methodologies in the future. At this stage, we are open to collaborating with other researchers under appropriate frameworks and conditions to validate and replicate our findings by providing essential code execution snippets or assisting with data analysis.

      (2) Your suggestions are vital for enhancing the quality of the manuscript. I will perform a comprehensive linguistic and structural review of the manuscript to ensure that statements flow coherently and punctuation is employed correctly. We also intend to engage a professional scientific and technical writing editor to ensure that the manuscript adheres to the high standards required for academic publishing.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more in-depth discussion of the potential clinical implications of the NPC-RSS. The authors should elaborate on how this score could be integrated into clinical decision-making and patient management.

      (2) The authors should consider including a section discussing the limitations of their study and potential areas for future research. This could include the need for prospective validation of the NPC-RSS in larger patient cohorts and the exploration of additional biological mechanisms.

      (1) We concur that a more comprehensive discussion regarding the application of the NPC-RSS in clinical decision-making would significantly enhance the practical value of this study. In the revised draft, we will include a section that elaborates on the integration of the NPC-RSS scoring system into daily clinical practice, detailing how it can assist physicians in developing individualized treatment plans and optimize patient management by predicting treatment responses.

      The following are new additions to the revised draft:

      “The incorporation of the NPC-RSS scoring system into clinical decision-making and patient management involves several key steps: first, establishing genetic testing as a standard component of nasopharyngeal cancer diagnosis and ensuring that physicians have prompt access to scoring results to guide treatment planning. Second, physicians should utilize the scoring results to tailor individualized treatment plans and engage in multidisciplinary discussions to optimize decision-making. Concurrently, physicians should elucidate the clinical significance of the scores and effectively communicate with patients to facilitate shared decision-making. Furthermore, continuous monitoring of the relationship between scoring and treatment outcomes, optimizing the scoring model based on empirical data, and ensuring the integration of technological platforms along with regulatory compliance are essential for safeguarding the effective operation of the scoring system and the protection of patient information.

      (2) In light of the reviewers' valuable suggestions, we acknowledge the significance of prospective validation of the NPC-RSS scoring system in a broader patient population and the necessity for thorough exploration of the underlying biological mechanisms. Accordingly, we are incorporating a new section in the revised manuscript that elaborates on the limitations of the current study and outlines potential directions for future research. This encompasses plans to increase the sample size for validation and further investigations into the biological basis of the scoring system to enhance its predictive validity and clinical applicability. We believe that these additions will significantly enrich the depth and breadth of the study, thereby serving the scientific community and clinical practice more effectively.”

      Minor Recommendations:

      (1) The authors should ensure that all abbreviations are defined at their first mention in the text.

      (2) The figure legends should be more descriptive and self-explanatory, allowing readers to understand the main findings without referring back to the main text.

      (1) You pointed out the need to define all acronyms at the first mention in the text and suggested that a comprehensive list of acronyms be included in the revised draft. We fully concur and have included a comprehensive list of acronyms in the revised text. Additionally, to enhance clarity, we have included the full name and definition of each acronym alongside its first occurrence in the text. This will assist readers in comprehending the study without the need to repeatedly refer to the glossary.

      (2) You recommended enhancing the descriptive quality of the figure legends to enable readers to discern the key findings from the figures without consulting the text. We have redesigned and refined all charts and legends to ensure they provide adequate information and are more descriptive. Each legend now outlines the experimental conditions, the variables employed, and the primary conclusions, ensuring that the charts themselves sufficiently convey the key findings of the study.

    1. Author response:

      We want to thank the reviewers for their positive and constructive comments on the manuscript. We already addressed some of their concerns and are planning the following revisions to both BEHAV3D-TP and the corresponding manuscript to address the reviewers’ comments. Below, we provide a response to the most significant comments, followed by a detailed, point-by-point response:

      (1) We acknowledge the reviewer's suggestion to incorporate open-source segmentation and tracking functionalities, increasing its accessibility to a wider user base; however, these additions fall outside the primary scope of our current work and represent a substantial undertaking in their own right. This topic has been comprehensively explored in other studies (e.g. https://doi.org/10.4049/jimmunol.2100811 ; https://doi.org/10.7554/eLife.60547 ; https://doi.org/10.1016/j.media.2022.102358 ; https://doi.org/10.1038/s41592-024-02295-6), which we will cite in our revised manuscript as indicated in our responses to the reviewers’ comments. Instead, the goal of our manuscript is to provide an analytical framework for processing data generated by existing segmentation and tracking pipelines. In our analyses, we used data processed with Imaris, a commercial software that, despite its limitations, is widely used by the intravital microscopy community due to its user-friendly platform for 3D image visualization and analysis. Nevertheless, to enhance compatibility with tracking data from various pipelines, we have modified our tool to accept data formats, such as those generated by open-source Fiji plugins like TrackMate (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). These updates are available in our GitHub repository, and we will describe this feature in the revised manuscript to emphasize compatibility with segmented and tracked data from diverse open-source platforms.

      (2) We appreciate the reviewer’s suggestion to incorporate additional features into our analytical pipeline. In response, we have already updated the GitHub repository to allow users to input and select which features (dynamic, morphological, or spatial) they wish to include in the analysis (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#feature-selection ) . In the revised manuscript, we will highlight this new functionality and provide examples using alternative datasets to demonstrate the application of these features.

      (3) We appreciate the constructive feedback of reviewers #1 and #2 regarding the statistical analysis and interpretation of the data presented in Figures 3 and 4. We understand the importance of clarity and rigor in data analysis and presentation, and we are committed to addressing the concerns raised in the revised version of the manuscript.

      (4) We appreciate Reviewer #1's suggestion regarding the inclusion of demo data, as we believe it would greatly enhance the usability of our pipeline. We acknowledge that this was an oversight on our part. To address this, we have now added demo data to our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the upcoming revised manuscript, we will also ensure to reference this addition. Additionally, we will  provide both original and processed IVM movie samples to support users in navigating the complete pipeline effectively.

      (5) Finally, we agree with the reviewers to make some small changes to the manuscript based on their feedback.

      Below we provide a point-by-point response to the reviewers’ comments, along with proposed revisions.

      Reviewer #1:

      Comment: A key limitation of the pipeline is that it does not overcome the main challenges and bottlenecks associated with processing and extracting quantitative cellular data from timelapse and longitudinal intravital images. This includes correcting breathing-induced movement artifacts, automated registration of longitudinal images taken over days/weeks, and accurate, automated segmentation and tracking of individual cells over time. Indeed, there are currently no standardised computational methods available for IVM data processing and analysis, with most laboratories relying on custom-built solutions or manual methods. This isn't made explicit in the manuscript early on (described below), and the researchers rely on expensive software packages such as IMARIS for image processing and data extraction to feed the required parameters into their pipeline. This limitation unfortunately reduces the likely impact of BEHAV3D-TP on the IVM field.

      As highlighted above, the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence, and displacement) from intravital images. Indeed, to use the tool researchers must first extract dynamic cellular parameters from their IVM datasets, requiring access to expensive software (e.g. IMARIS as used here) and/or above-average computational expertise to develop and use custom-made open-source solutions. This limitation is not made explicit or discussed in the text.

      As mentioned previously, we agree with the reviewer that image processing steps, such as segmentation, tracking, and motion correction, present significant challenges in intravital microscopy (IVM) data processing. While these aspects are being addressed by other researchers, our publication centers on the analysis of acquired data rather than on the image processing itself. Our motivation, as outlined in the manuscript, arises from our own experience: despite the substantial effort invested in image processing, researchers often rely on simplistic analytical approaches, such as averaging single parameters and comparing them across conditions. These approaches tend to overlook potential tumor heterogeneity.

      Our work aimed to develop an analytical tool that provides a comprehensive framework for extracting more insights from processed IVM data, with a focus on two key aspects: capturing the heterogeneity of tumor behavior and examining the spatial distribution of these behaviors within the tumor microenvironment. In the revised manuscript, we will clarify the scope of our study, emphasizing its limitations as an analytical tool rather than an image-processing solution. Additionally, we will provide references to relevant literature on available (open-source) software options for image processing (e.g. Diego Ulisse Pizzagalli et al J Immunol (2022); Aby Joseph et al eLife (2020) ;Molina-Moreno M et al Medical Image Analysis (2022); Hidalgo-Cenalmor, I et al, Nat Methods  (2024); Ershov. D et al Nat Methods  (2022)).

      Regarding the reviewer’s comment on our use of Imaris, we acknowledge that Imaris is a costly commercial software. However, based on our experience, it is widely used by the intravital microscopy community due to its user-friendly interface for 3D image visualization and analysis. Despite its limitations in accuracy and the fact that it is not open-source, we believe that including data processed with Imaris will be valuable to the IVM community.

      However, to improve compatibility with data from other segmentation and tracking pipelines, we have already updated our tool to support formats generated by open-source Fiji plugins like TrackMate. These updates are available in our GitHub repository, and we will describe this functionality in detail in the revised manuscript to ensure compatibility with segmented and tracked data from various open-source platforms.

      Comment: The number of cells (e.g. per behavioural cluster), and the number of independent mice, represented in each result figure, is not included in the figure legends and are difficult to ascertain from the methods.

      We appreciate the reviewer's constructive feedback regarding the clarity of the number and type of replicates used in our analyses. In the revised manuscript, we will include detailed information in the figure legends regarding the number of cells (e.g., per behavioral cluster) and the number of independent mice represented in each result figure to ensure transparency.

      Comment: The data used to test the pipeline in this manuscript is currently not available, making it difficult to assess its usability. It would be important to include this for researchers to use as a 'training dataset'.

      As stated above we acknowledge that this was an oversight on our part and thank the reviewer for pointing this out. To address this, we have now added demo data to our GitHub repository (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0/demo_datasets). In the upcoming revised manuscript, we will also make sure to reference this addition. Additionally, we intend to provide both original and processed IVM movie samples to support users in navigating the complete pipeline effectively.

      Comment: Precisely how the BEHAV3D-TP large-scale phenotyping module can map large-scale spatial phenotyping data generated using LSR-3D imaging data and Cytomap to 3D intravital imaging movies is unclear. Further details in the text and methods would be beneficial to aid understanding.

      We appreciate the reviewer’s comment and will provide additional details in the text and methods of the revised manuscript to clarify how the BEHAV3D-TP module maps LSR-3D and Cytomap data to 3D intravital imaging movies.

      Comment: The analysis provides only preliminary evidence in support of the authors' conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. Conclusions should therefore be tempered in the absence of additional experiments and controls.

      We appreciate the reviewer’s comment and acknowledge that our conclusions should be tempered due to the preliminary nature of our evidence. To be able to directly analyze the impact of the brain tumor microenvironment on cancer cell behavior, we will include a new set of analyses in the revised manuscript. Specifically, we will utilize BEHAV3D-TP to analyze existing IVM data from adult gliomas with and without macrophage depletion (Alieva et al, Scientific Reports, 2017; https://doi.org/10.1038/s41598-017-07660-4 ) to evaluate the differences in heterogeneous cell populations under these conditions. Since this analysis pertains to a different tumor type, we will revise our conclusions accordingly and emphasize the necessity for additional experiments and controls to further validate our findings on DMG cell migratory behaviors and their relationship with the tumor microenvironment.

      Reviewer #2:

      Comment: The strength of democratizing this kind of analysis is undercut by the reliance upon Imaris for segmentation, so it would be nice if this was changed to an open-source option for track generation.

      As noted in our previous response to Reviewer #1, we would like to point out that although Imaris is a commercial software, it is widely used in the intravital microscopy (IVM) community due to its user-friendly interface. One of its key advantages, which we also utilized, is semi-automated data tracking that allows for manual corrections in 3D—a process that can be more challenging in other open-source software with less effective data visualization.

      However, we recognize that enhancing our pipeline's compatibility with open-source options is important. To this end, we have already updated our tool to support data formats generated by open-source Fiji plugins like TrackMate, improving compatibility with various segmentation and tracking pipelines (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ). We will describe these updates in the revised manuscript to clarify our study's scope and the available image processing options.

      Comment: The main issue is with the interpretation of the biological data in Figure 3 where ANOVA was used to analyse the proportional distribution of different clusters. Firstly the n is not listed so it is unclear if this represents an n of 3 where each mouse is an individual or whether each track is being treated as a test unit. If the latter this is seriously flawed as these tracks can't be treated as independent. Also, a more appropriate test would be something like a Chi-squared test or Fisher's exact test. Also, no error bars are included on the stacked bar graphs making interpretation impossible. Ultimately this is severely flawed and also appears to show very small differences which may be statistically different but may not represent biologically important findings. This would need further study.

      We appreciate the reviewer’s insightful comments regarding the interpretation of the biological data in Figure 3. To clarify, each mouse serves as an independent unit in this analysis. We believe that ANOVA is the appropriate test for comparing the proportions of different behavioral signatures across the tumor microenvironment (TME) regions identified by large-scale phenotyping. However, we acknowledge that using a stacked bar plot may have been misleading. While a Chi-squared test could show differences in the distribution of behavioral signatures, it would not indicate which specific signatures are responsible for those differences. Therefore, in the revised manuscript, we will retain the ANOVA analysis but will represent the proportions using a bar chart that clearly illustrates multiple conditions for each behavioral cluster. We also appreciate the reviewer’s concern regarding the transparency of our data. In the revised manuscript, we will include the number of replicates for all figures to enhance clarity and understanding.

      Comment:  Figure 4 has similar statistical issues in that the n is not listed and, again, it is unclear whether they are treating each cell track as independent which, again, would be inappropriate. The best practice for this type of data would be the use of super plots as outlined in Lord et al. (2020) JCI - SuperPlots: Communicating reproducibility and variability in cell biology.

      We appreciate the reviewer’s comments and suggestions regarding Figure 4. In the revised manuscript, we will clarify the number of replicates used and our approach to treating cell tracks as independent units. We will implement super-plots where appropriate, to enhance the communication of reproducibility and variability in our data.

      Comment: The main issue that this raises is that the large-scale phenotyping module and the heterogeneity module appear designed to produce these statistical analyses that are used in these figures and, if they are based on the assumption that each track is independent, then this will produce inappropriate analyses as a default.

      We appreciate the reviewer’s comment, though we find ourselves unsure about the specific concern being raised. To clarify, each mouse is treated as an independent unit in our analyses. For each large-scale phenotyping region, we measure the proportion of tumor cells displaying a specific behavioral phenotype independently for each mouse. These proportions are then used for statistical analysis. We hope this explanation provides clarity, and we will adjust the manuscript to better convey this methodology.

      Reviewer #3:

      Comment: The most challenging task of analyzing 3D time-lapse imaging data is to accurately segment and track the individual cells in 3D over a long time duration. BEHAV3D Tumor Profiler did not provide any new advancement in this regard, and instead relies on commercial software, Imaris, for this critical step. Imaris is known to have a very high error rate when used for analyzing 3D time-lapse data. In the Methods section, the authors themselves stated that "Tumor cell tracks were manually corrected to ensure accurate tracking". Based on our own experience of using Imaris, such manual correction is tedious and often required for every time step of the movie. Therefore, Imaris is not a satisfactory tool for analyzing 3D time-lapse data. Moreover, Imaris is expensive and many research labs probably can't afford to buy it. The fact that BEHAV3D Tumor Profiler critically depends on the faulty ImarisTrack module makes it unclear whether the BEHAV3D tool or the results are reliable.

      If the authors want to "democratize the analysis of heterogeneous cancer cell behaviors", they should perform image segmentation and tracking using open-source codes (e.g., Cellpose, Stardisk & 3DCellTracker) and not rely on the expensive and inaccurate ImarisTrack Module for the image analysis step of BEHAV3D.

      We appreciate the reviewer’s comments on the challenges of segmenting and tracking individual cells in 3D time-lapse imaging data. As mentioned previously, our primary focus is to develop an analytical tool for comprehensive data analysis rather than developing tools for image processing. To enhance accessibility, we have updated our tool to support data formats from open-source Fiji plugins, such as TrackMate, which will benefit users without access to commercial software (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler?tab=readme-ov-file#data-input ).

      While we recognize the limitations of Imaris, it remains widely used in the intravital microscopy community due to its user-friendly interface for 3D visualization and semi-automated segmentation capabilities. Since no perfect tracking method currently exist, we utilized Imaris for its ability to allow manual corrections of faulty tracks, ensuring the reliability of our results. This approach was the best available option when we began our analysis, allowing us to obtain accurate results efficiently.

      In the revised manuscript, we will clarify our methodology and provide information on both Imaris and alternative processing options to strengthen the reliability of our findings.

      Comment: The authors developed a "Heterogeneity module" to extract distinctive tumor migratory phenotypes from the cell tracks quantified by Imaris. The cell tracks of the individual tumor cells are all quite short, indicating relatively low motility of the tumor cells. It's unclear whether such short migratory tracks are sufficient to warrant the PCA analysis to identify the 7 distinctive migratory phenotypes shown in Figure 2d. It's also unclear whether these 7 migratory phenotypes correspond to unique functional phenotypes.

      For the 7 distinctive motility clusters, the authors should provide a more detailed analysis of the differences between them. It's unclear whether the difference in retreating, slow retreating, erratic, static, slow, slow invading, and invading correspond to functional difference of the tumor cells.

      While some tumor cells exhibit limited motility, indicated by short tracks, others demonstrate significant migratory capabilities. This variability in tumor cell behavior is a central focus of our analysis, and our tool is specifically designed to identify and distinguish these differences. Our PCA analysis effectively captures this variability, as illustrated in Figure 2 d-f. It differentiates between cells exhibiting varying degrees of migratory behavior, including both highly migratory and less migratory phenotypes, as well as their directionality relative to the tumor core and the persistence of their movements. Thus, we believe that our approach provides valuable insights into the distinct migratory phenotypes within the tumor microenvironment. We will clarify these aspects further in the revised manuscript to enhance the reader's understanding of our findings.

      While our current manuscript does not provide explicit evidence linking each motility cluster to functional differences among the tumor cells, it is important to note that the state of the field supports the idea that cell dynamics can predict cell states and phenotypes. Research conducted by ourselves (Dekkers, Alieva et al., Nat Biotech, 2023) and others, such as Craiciuc et al. (Nature, 2022) and Freckmann et al. (Nat Comm, 2022) has shown that variations in cell motility patterns are indicative of underlying functional characteristics. For instance, cell morphodynamic features have been shown to reflect differences in cell types, T cell targeting states, tumor metastatic potential, and drug resistance states. In the revised manuscript, we will reference relevant studies to underscore the biological significance of these behaviors. By doing so, we hope to clarify the potential implications of our findings and strengthen the overall narrative of our research.

      Comment: Using only motility to classify tumor cell behaviours in the tumor microenvironment (TME) is probably not sufficient to capture the tumor cell difference. There are also other non-tumor cell types in the TME. If the authors aim to develop a computational tool that can elucidate tumor cell behaviors in the TME, they should consider other tumor cell features, e.g., morphology, proliferation state, and tumor cell interaction with other cell types, e.g., fibroblasts and distinct immune cells.

      The authors should expand the scale of tumor behavior features to classify the tumor phenotype clusters, e.g., to include tumor morphology, proliferation state, and tumor cell interaction with other TME cell types.

      We believe that using dynamic features alone is sufficient to capture differences in tumor behavior, as demonstrated by our results in Figure 2. However, we appreciate the reviewer’s suggestion to consider additional features, such as cell morphology and interactions with other cell types, to finetune our analyses. To this end, we have adapted our pipeline to be compatible with various features present in the data (https://github.com/imAIgene-Dream3D/BEHAV3D_Tumor_Profiler/tree/BEHAV3D_TP-v2.0?tab=readme-ov-file#feature-selection ). We will emphasize this in the revised manuscript. However, we would like to point out that not all features may provide informative insights and that a wide range of features can instead introduce biologically irrelevant noise, making interpretation more challenging. For instance, in 3D microscopy, the z-axis resolution is typically lower, which can lead to artifacts like elongation in that direction. Adding morphological features that capture this may skew the analysis. Therefore, we believe that incorporating additional features should be approached with caution. We will clarify these considerations in the revised manuscript to better guide users in utilizing our computational tool effectively. We will also reference the use of unbiased feature selection techniques, such as bootstrapping methods, to identify biologically relevant features based on the conditions provided (D.G. Aragones et al, Computers in Biology and Medicine (2024)).

      Comment: The authors have already published two papers on BEHAV3D [Alieva M et al. Nat Protoc. 2024 Jul;19(7): 2052-2084; Dekkers JF, et al. Nat Biotechnol. 2023 Jan;41(1):60-69]. Although the previous two papers used BEHAV3D to analyze T cells, the basic pipeline and computational steps are similar, in particular regarding cell segmentation and tracking. The addition of a "Heterogeneity module" based on PCA analysis does not make a significant advancement in terms of image analysis and quantification.

      We want to emphasize that we have no intention of duplicating our previous publications. In this manuscript, we have consistently cited our foundational papers, where BEHAV3D was first developed for T cell migratory analysis in in vitro settings. In the introduction, we clearly state that our earlier work inspired us to adopt a similar approach for analyzing cell behavior in intravital microscopy (IVM) data, addressing the specific needs and complexities of analyzing tumor cell behaviors in the tumor microenvironment.

      Importantly, our new work provides several key advancements: 1) a pipeline specifically adapted for intravital microscopy (IVM) data; 2) integration of spatial characteristics from both large-scale and small-scale phenotyping; and 3) a zero-code approach designed to empower researchers without coding skills to effectively utilize the tool. We believe that these enhancements represent meaningful progress in the analysis of cell behaviors within the tumor microenvironment which will be valuable for the IVM community. We will ensure that these points are clearly articulated in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article identifies ADGR3 as a candidate GPCR for mediating beige fat development. The authors use human expression data from the Human protein atlas and Gtex databases and combine this with experiments performed in mice and a murine cell line. They refer to a GPCR bioactivity screening tool PRESTO-Salsa, with which it was found that Hesperetin activates ADGR3. From their experiments, authors conclude that Hesperetin activates ADGR3, inducing a Gs-PKA-CREB axis resulting in adipose thermogenesis.

      Strengths:

      The authors analyze human data from public databases and perform functional studies in mouse models. They identify a new GPCR with a role in the thermogenic activation of adipocytes.

      Weaknesses:

      (1) Selection of ADGRA3 as a candidate GPCR relevant for mediating beiging in humans:

      The authors identify genes upregulated in iBAT compared to iWAT in response to cold, and among these differentially expressed genes, they identify highly expressed GPCRs in human white adipocytes (visceral or subcutaneous). Finally, among these genes, they select a GPCR not previously studied in the literature.

      If the authors are interested in beiging, why do they not focus on genes upregulated in iWAT (the depot where beiging is described to occur in mice), comparing thermoneutral to cold-induced genes? I would expect that genes induced in iWAT in response to cold would be extremely relevant targets for beiging. With their strategy, the authors exclude receptors that are induced in the tissue where beiging is actually described to occur.

      Furthermore, the authors are comparing genes upregulated in cold in BAT (but not WAT) to highly expressed genes in human white adipocytes during thermoneutrality. Overall, the authors fail to discuss the logic behind their strategy and the obvious limitations of it.

      Thanks for your valuable advice. In this study, we focus on genes that exhibited higher expression in BAT compared to iWAT under cold stimulation conditions, as these genes might play a role in adipose thermogenesis. Regarding the genes you mentioned that iWAT upregulates following cold stimulation, we did identify other intriguing targets in these genes in another ongoing study, albeit not encompassed within the scope of this study. Moreover, instead of making a comparison, we intersected 27 GPCR coding genes that were highly expressed in BAT compared to iWAT with genes that were highly expressed in human adipocytes (Figure 1C).

      With your suggestions, we realized that the description of the screening strategy in the manuscript was not clear enough, so we made the following supplement:

      “…dataset obtained from the Gene Expression Omnibus (GEO) database. Additionally, we utilized the human subcutaneous adipocytes dataset (Figure 1C, red) and human visceral adipocytes dataset (Figure 1C, purple) from the human protein atlas database to obtain genes that are highly expressed in human white adipocytes. The GSE118849 dataset comprises samples of brown adipose tissue (BAT) and inguinal white adipose tissue (iWAT) obtained from mice subjected to a 72-hour cold exposure at a temperature of 4℃.

      A total of 1134 differentially expressed genes (DEGs) that exhibited up-regulation in BAT compared to iWAT under cold stimulation were identified in the analysis, which might play a role in adipose thermogenesis. These DEGs were further screened to identify highly…”

      (2) Relevance of ADGRA3 and comparison to established literature:

      There has been a lot of literature and discussion about which receptor should be targeted in humans to recruit thermogenic fat. The current article unfortunately does not discuss this literature nor explain how it relates to their findings. For example, O'Mara et al (PMID: 31961826) demonstrated that chronic stimulation with the B3 adrenergic agonist, Mirabegron, resulted in the recruitment of thermogenic fat and improvement in insulin sensitivity and cholesterol. Later, Blondin et al (PMID: 32755608), highlighted the B2 adrenergic receptor as the main activation path of thermogenic fat in humans. There is also a recent report on an agonist activating B2 and B3 simultaneously (PMID: 38796310). Thus, to bring the literature forward, it would be beneficial if the current manuscript compared their identified activation path with the activation of these already established receptors and discussed their findings in relation to previous studies.

      Thanks to your suggestion. We have included a supplementary discussion on the relevant human adipose thermogenic receptors in the discussion section, as presented below:

      “The induction of beige fat has been investigated as a potentially effective therapeutic approach in combating obesity [23]. A clinical trial revealed that treatment with the chronic β3-AR agonist mirabegron leads to an increase in human brown fat, HDL cholesterol, and insulin sensitivity [24]. Subsequently, Blondin et al discovered that oral administration of mirabegron only elicits an increase in BAT thermogenesis when administered at the maximal allowable dose, indicating that human brown adipocyte thermogenesis is primarily driven by β2-adrenoceptor (β2-AR) stimulation [11]. Consistent with this finding, we found much higher levels of ADRB2 expression in human white adipose tissue than ADRB3 (Figure S1E). Furthermore, a recent study has demonstrated that simultaneous activation of β2-AR and β3-AR enhances whole-body metabolism through beneficial effects on skeletal muscle and BAT [25].”

      In Figures 1d and e, the authors show the expression of ADGRA3 in comparison to the expression of ADRB3. In human brown adipocytes, ADRB2 has been shown to be the main receptor through which adrenergic activation occurs (PMID: 32755608), thus authors should show the relative expression of this gene as well.

      We wholeheartedly endorse the proposal to augment the ADRB2 expression data in Figures 1D and E. However, it is regrettable to note that the pertinent databases (PRJNA66167 and PRJEB4337) are deficient in ADRB2 expression information. Fortunately, the GTEx database houses the ADRB2 expression data. Consequently, we have integrated these crucial data into Figure S1E.

      (3) Strategy to investigate the role of ADGRA3 in WAT beiging:

      Having identified ADGRA3 as their candidate receptor, the authors proceed with investigations of this receptor in mouse models and the murine inguinal adipocyte cell line 3T3.

      First of all, in Figure 1D, the authors show a substantially lower expression of ADGRA3 compared to ADRB3. It could thus be argued that a mouse would not be the best model system for studying this receptor. It would be interesting to see data from experiments in human adipocytes.

      Thanks for your helpful advice. We induced human adipose-derived mesenchymal stem cells (hADSCs) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      Moreover, if the authors are interested in inducing beiging, why do they show expression in iBAT and not iWAT?

      Maybe the description of this article wasn't clear enough, but we did show the expression and effects of ADGRA3 in iWAT and BAT (Author response image 1, Figure 3F-J and Figure 4F-J).

      Author response image 1.

      The authors perform in vivo experiments using intraperitoneal injections of shRNA or overexpression CMV-driven vectors and report effects on body temperature and glucose metabolism. It is here important to note that ADGRA3 is not uniquely expressed in adipocytes. A major advantage of databases like the Human Protein Atlas and Gtex, is that they give an overview of the gene expression across tissues and cell types. When looking up ADGRA3 in these databases, it is expressed in subcutaneous and visceral adipocytes. However, other cell types and tissues demonstrate an even higher expression. In the Human protein atlas, the enhanced cell types are astrocytes and hepatocytes. In the Gtex database tissues with the highest expression are Brain, Liver, and Thyroid.

      With this information in mind, IP injections for modification of ADGRA3 receptor expression could be expected to affect any of these tissues and cells.

      The manuscript report changes body temperature. However, temperature is regulated by the brain and also affected by thyroid activity. Did the authors measure the levels of circulating thyroid hormones? Gene expression changes in the brain? The authors report that Adgra3 overexpression decreased the TG level in serum and liver. The liver could be the primary targeted organ here, and the adipose effects might be secondary. The data would be easier to interpret if authors reported the effects on the liver, thyroid, and brain, and the gene expression across tissues should be discussed in the article.

      Thank you for your valuable advice. We supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H), the levels of circulating thyroid hormones (Figures S2H, S4F and S5B) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in multiple tissues as well as discussed in the article, as follows:

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

      Finally, the identification of Hesperetin using the PRESTO-Salsa tool, and how specific the effect of Hesperetin is on ADGRA3, is currently unclear. This should be better discussed, and authors should consider measuring the established effects of Hesperetin in their model systems, including apoptosis.

      Thanks for your suggestion. We have further discussed the relevant content and added it in the discussion section as follows:

      “Previously, the influence of hesperetin on ADGRA3 has remained unreported. In this study, we screened hesperetin as a potential agonist for ADGRA3 by using the PRESTO-Salsa tool as well as discovered that hesperetin has an agonist effect on ADGRA3 through a series of experiments. This study focuses on the regulatory effect of hesperetin on adipose thermogenesis and explores whether this effect is dependent upon ADGRA3. As such, we refrained from conducting further investigations into other potential effects of hesperidin, including its potential role in antioxidant and in apoptosis.”

      Reviewer #2 (Public Review):

      Based on bioinformatics and expression analysis using mouse and human samples, the authors claim that the adhesion G-protein coupled receptor ADGRA3 may be a valuable target for increasing thermogenic activity and metabolic health. Genetic approaches to deplete ADGRA3 expression in vitro resulted in reduced expression of thermogenic genes including Ucp1, reduced basal respiration, and metabolic activity as reflected by reduced glucose uptake and triglyceride accumulation. In line, nanoparticle delivery of shAdgra3 constructs is associated with increased body weight, reduced thermogenic gene expression in white and brown adipose tissue (WAT, BAT), and impaired glucose and insulin tolerance. On the other hand, ADGRA3 overexpression is associated with an improved metabolic profile in vitro and in vivo, which can be explained by increasing the activity of the well-established Gs-PKA-CREB axis. Notably, a computational screen suggested that ADGRA3 is activated by hesperetin. This metabolite is a derivative of the major citrus flavonoid hesperidin and has been described to promote metabolic health. Using appropriate in vitro and in vivo studies, the authors show that hesperetin supplementation is associated with increased thermogenesis, UCP1 levels in WAT and BAT, and improved glucose tolerance, an effect that was attenuated in the absence of ADGRA3 expression.

      Overall, the data suggest that ADGRA3 is a constitutively active Gs-coupled receptor that improves metabolism by activating adaptive thermogenesis in WAT and BAT. The conclusions of the paper are partly supported by the data, but some experimental approaches need further clarification.

      (1) The in vivo approaches to modulate Adgra3 expression in mice are carried out using non-targeted nanoparticle-based approaches. The authors do not provide details of the composition of the nanomaterials, but it is highly likely that other metabolically active organs such as the liver are targeted. This is critical because Adgre3 is expressed in many organs, including the liver, adrenal glands, and gastrointestinal system. Therefore, many of the observed metabolic effects could be indirect, for example by modulating bile acids or corticosterone levels. Consistent with this, after digestion in the gastrointestinal tract, hesperetin is rapidly metabolized in intestinal and liver cells. Thus, hesperetin levels in the systemic circulation are likely to be insufficient to activate Adgra3 in thermogenic adipocytes/precursors. Overall, the authors need to repeat the key metabolic experiments in adipose-specific Adgra3 knockout/overexpression models to validate the reliability of the in vivo results. In addition, to validate the relevance of hesperetin supplementation for adaptive thermogenesis in BAT and WAT vivo, the levels of hesperetin present in the systemic circulation should be quantified.

      Thank you for your valuable advice. Unfortunately, we could not perform quantitative determination of hesperetin concentration in the systemic circulation because we had used the serum of hesperetin-treated mice for the quantitative determination of serum insulin, fT4 and TG. According to your other suggestions, we supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H), the levels of circulating thyroid hormones (Figures S2H, S4F and S5B) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in multiple tissues as well as discussed in the article, as follows:

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

      (2) Standard measurements for energy balance are not presented. Quantitative data on energy expenditure, e.g. by indirect calorimetry, and food intake are missing and need to be included to validate the authors' claims.

      We are in full agreement with your proposal. Regrettably, owing to the constraints of experimental facilities, we are presently unable to access quantitative data pertaining to the energy expenditure of animals. However, we believe that the present results can also partially support the idea that ADGRA3 promotes energy metabolism and the results of the effect of ADGRA3 on food intake were shown in Figure S2C and Figure S5A respectively.

      (3) The thermographic images used to determine the BAT temperature are not very convincing. The distance and angle between the thermal camera and the BAT have a significant effect on the determination of the temperature, which is not taken into account, at least in the images presented.

      Thank you very much for pointing out the lack of our method description. According to the methods of literatures (Xia, Bo et al. PLoS biology. 2020. doi:10.1371/journal.pbio.3000688) and (Warner, Amy et al. PNAS. 2013. doi:10.1073/pnas.1310300110), the same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. We have supplemented this description in the Materials and Methods section, as shown below:

      “2.20. Infrared Thermography.

      BAT temperature was measured at room temperature by infrared thermography according to previous publications [22, 23]. The same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. To quantify interscapular region temperature, the average surface temperature from a region of the interscapular BAT was taken with FLIR Tools software.”

      (4) The 3T3-L1 cell line is not an adequate cell culture model to study thermogenic adipocyte differentiation. To validate their results, the key experiments showing that ADGRA3 expression modulates thermogenic marker expression in a hesperetin-dependent manner need to be performed in a reliable model, e.g. primary murine adipocytes.

      Induction of 3T3L1 cell line into white adipocytes is indeed not suitable for studying thermogenic adipocyte differentiation. However, with reference to previous studies (Wei, Gang et al. Cell metabolism. 2021. doi: 10.1016/j.cmet.2021.08.012 ) and (Bae IS, Kim SH. Int J Mol Sci. 2019. doi: 10.3390/ijms20246128), 3T3-L1 cell line was used to differentiate into beige-like adipocytes in this study, and many studies believe that this method is suitable for studying the thermogenic effect of adipocytes in vitro. Meanwhile, we provided a more detailed description of the induction of beige-like adipocytes by 3T3-L1 in the Materials and Methods section and induced human adipose-derived stem cells (hADSC) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      “…supplemented with 10% FBS. Confluent 3T3-L1 pre-adipocytes were induced into mature beige-like adipocytes with 0.5 mM isobutyl methylxanthine (IBMX), 1 μM dexamethasone, 5 μg/ml insulin, 1 nM 3, 3', 5-Triiodo-L-thyronine (T3), 125 μM indomethacin and 1 μM rosiglitazone in high-glucose DMEM containing 10% FBS for 2 days, then treated with high-glucose DMEM containing 5 μg/ml insulin, 1 nM T3, 1 μM rosiglitazone and 10% FBS for 6 days and cultured with high-glucose DMEM containing 10% FBS for 2 days. hADSCs were seeded on plates coated with 0.1% gelatin and culture and grown to confluence in human mesenchymal stem cells (hMSCs) specialized culture medium (ZQ-1320). Confluent hADSCs were induced into mature human adipocytes with adipogenic induction medium (PCM-I-004) according to the manufacturer’s instructions.”

      (5) The experimental setup only allows the measurement of basal cellular respiration. More advanced approaches are needed to define the contribution of ADGRA3 versus classical adrenergic receptors to UCP1-dependent thermogenesis.

      Thanks for your suggestion. The maximum oxygen consumption rate of the cells was also measured (Figures 2G and 2N) by adding FCCP, an uncoupler of oxidative phosphorylation (OXPHOS) in mitochondria.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Zhao et al. explored the function of adhesion G protein-coupled receptor A3 (ADGRA3) in thermogenic fat biology.

      Strengths:

      Through both in vivo and in vitro studies, the authors found that the gain function of ADGRA3 leads to browning of white fat and ameliorates insulin resistance.

      Weaknesses:

      There are several lines of weak methodologies such as using 3T3-L1 adipocytes and intraperitoneal(i.p.) injection of virus. Moreover, as the authors stated that ADGRA3 is constitutively active, how could the authors then identify a chemical ligand?

      (1) Primary cultured cells should be used to perform gain and loss function analysis of ADGRA3, instead of using 3T3-L1. It is impossible to detect Ucp1 expression in 3T3-L1 cells.

      Induction of 3T3L1 cell line into white adipocytes is indeed difficult for detecting UCP1 expression. However, with reference to previous studies (Wei, Gang et al. Cell metabolism. 2021. doi:10.1016/j.cmet.2021.08.012) and (Bae IS, Kim SH. Int J Mol Sci. 2019. doi:10.3390/ijms20246128), 3T3-L1 cell line was used to differentiate into beige-like adipocytes in this study, and many studies believe that this method is suitable for studying the thermogenic effect of adipocytes in vitro. Meanwhile, we provided a more detailed description of the induction of beige-like adipocytes by 3T3-L1 in the Materials and Methods section and induced human adipose-derived stem cells (hADSC) into adipocytes to evaluate the effect of ADGRA3 on human adipocytes (Figure 8).

      “…supplemented with 10% FBS. Confluent 3T3-L1 pre-adipocytes were induced into mature beige-like adipocytes with 0.5 mM isobutyl methylxanthine (IBMX), 1 μM dexamethasone, 5 μg/ml insulin, 1 nM 3, 3', 5-Triiodo-L-thyronine (T3), 125 μM indomethacin and 1 μM rosiglitazone in high-glucose DMEM containing 10% FBS for 2 days, then treated with high-glucose DMEM containing 5 μg/ml insulin, 1 nM T3, 1 μM rosiglitazone and 10% FBS for 6 days and cultured with high-glucose DMEM containing 10% FBS for 2 days. hADSCs were seeded on plates coated with 0.1% gelatin and culture and grown to confluence in human mesenchymal stem cells (hMSCs) specialized culture medium (ZQ-1320). Confluent hADSCs were induced into mature human adipocytes with adipogenic induction medium (PCM-I-004) according to the manufacturer’s instructions.”

      (2) For virus treatment, the authors should consider performing local tissue injection, rather than IP injection. If it is IP injection, have the authors checked other tissues to validate whether the phenotype is fat-specific?

      Thank you for your valuable advice. We supplemented the results of the effect of local BAT injection of Adgra3 OE on thermogenic genes (Figures S5G-H) and the effects of Adgra3 overexpression/knockdown on Adgra3 expression levels (Figures S2A-B and S4B-C) in other tissues.

      (3) The authors should clarify how constitutively active GPCR needs further ligands.

      Thank you for your suggestion. In fact, we only identified hesperetin as a potential agonist of ADGRA3 rather than a ligand. The results also indicate that overexpression of ADGRA3 without additional hesperetin is sufficient to activate downstream PKA signaling pathways through constitutive activity (Figure 5). Recently, Chen et al identified oleic ethanolamine (OEA) as a potential endogenous agonist of GPR3, which is also a constitutively active GPCR. Overall, the high constitutive activity of constitutively active GPCRs arises from the combined effects of stimulation by endogenous agonists and their basal coupling with Gs.

      As for why we screened and identified potential agonists of ADGRA3, we hope to find more convenient pathways for its clinical application than gene overexpression, as described in the article:      

      “Considering the difficulty of overexpressing ADGRA3 in clinical application, hesperetin was screened as a potential agonist of ADGRA3 by PRESTO-Salsa database (Figure 6A). The…”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The title appears to be overstated as no clinical trials were performed and experiments were not even performed in human brown adipocytes.

      Thank you for your critical suggestion, therefore we have added the experimental results of human adipocytes (Figure 8) and revised the title to “Constitutively active receptor ADGRA3 signaling induces adipose thermogenesis”.

      Please specify n-number and what are replicates or independent experiments. Please also state if any outliers were excluded and why.

      Thanks for your valuable suggestion. We have added a description of the n-number in the Figure legends section, number of independent experiments and exclusion criteria for outliers in the Materials and Methods section, as follows:

      “…of tissue samples. Cohorts of ≥4 mice per genotype or treatment were assembled for all in vivo studies. All in vivo studies were repeated 2-3 independent times. All procedures related to…”

      “…μM H-89) was added to 3T3-L1 mature beige-like adipocytes for 48 hours. All in vitro studies were repeated 2-3 independent times.”

      “All data are presented as mean ± SEM. In this study, outliers that met the three-sigma rule were excluded from analysis, with the exception of those presented in Figure S1E. Given the possibility that the outliers in Figure S1E represent extreme expressions of the inherent variability within the population sample, we have chosen to retain these specific outliers for further analysis. Student’s t-test was used to compare two groups. One-way analysis of…”

      Authors use Infrared Thermography to measure body temperature. Depending on the distance between the mouse and the camera, the mouse needs to be at the same spot.

      Thank you very much for pointing out the lack of our method description. According to the methods of literatures (Xia, Bo et al. PLoS biology. 2020. doi:10.1371/journal.pbio.3000688) and (Warner, Amy et al. PNAS. 2013. doi:10.1073/pnas.1310300110), the same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. We have supplemented this description in the Materials and Methods section, as shown below:

      “2.20. Infrared Thermography.

      BAT temperature was measured at room temperature by infrared thermography according to previous publications [22, 23]. The same batch of representative infrared images of mice were all captured using a thermal imaging camera (FLIR ONE PRO), measured at the same distance perpendicular to the plane on which the mice were located. To quantify interscapular region temperature, the average surface temperature from a region of the interscapular BAT was taken with FLIR Tools software.”

      Please discuss the limitations of the experiments and discuss the relevant literature.

      Thanks for your recommendations. We discussed the limitations of the experiments and the relevant literature in the discussion section, as follows:

      “The induction of beige fat has been investigated as a potentially effective therapeutic approach in combating obesity [23]. A clinical trial revealed that treatment with the chronic β3-AR agonist mirabegron leads to an increase in human brown fat, HDL cholesterol, and insulin sensitivity [24]. Subsequently, Blondin et al discovered that oral administration of mirabegron only elicits an increase in BAT thermogenesis when administered at the maximal allowable dose, indicating that human brown adipocyte thermogenesis is primarily driven by β2-adrenoceptor (β2-AR) stimulation [11]. Consistent with this finding, we found much higher levels of ADRB2 expression in human white adipose tissue than ADRB3 (Figure S1E). Furthermore, a recent study has demonstrated that simultaneous activation of β2-AR and β3-AR enhances whole-body metabolism through beneficial effects on skeletal muscle and BAT [25].”

      “Given the consideration that the non-targeted nanoparticle approach utilized in this study for modulating Adgra3 expression levels in vivo alter Adgra3 expression in tissues beyond adipose tissue (Figures S2A-B and S4B-C), notably the liver and skeletal muscle, the construction of Adgra3 adipose tissue-specific knockout/overexpression mouse models is imperative for a more nuanced understanding of the precise mechanisms underlying the influence of on adipose thermogenesis. We will employ more sophisticated models in subsequent studies to further elucidate the effects of ADGRA3 on adipose thermogenesis and metabolic homeostasis. Nevertheless, our findings underlie a potential therapeutic feature of…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary

      The mammalian Shieldin complex consisting of REV7 (aka MAD2L2, MAD2B) and SHLD1-3 affects pathway usage in DSB repair favoring non-homologous end-joining (NHEJ) at the expense of homologous recombination (HR) by blocking resection and/or priming fill-in DNA synthesis to maintain or generate near blunt ends suitable for NHEJ. While the budding yeast Saccharomyces cerevisiae does not have homologs to SHLD1-3, it does have Rev7, which was identified to function in conjunction with Rev3 in the translesion DNA polymerase zeta. Testing the hypothesis that Rev7 also affects DSB resection in budding yeast, the work identified a direct interaction between Rev7 and the Rad50-Mre11-Xrs2 complex by two-hybrid and direct protein interaction experiments. Deletion analysis identified that the 42 amino acid C-terminal region was necessary and sufficient for the 2-hybrid interaction. Direct biochemical analysis of the 42 aa peptide was not possible. Rev7 deficient cells were found to be sensitive to HU only in synergy with G2 tetraplex forming DNA. Importantly, the 42 aa peptide alone suppressed this phenotype. Biochemical analysis with full-length Rev7 and a C-terminal truncation lacking the 42 aa region shows G4-specific DNA binding that is abolished in the C-terminal truncation and with a substrate containing mutations to prevent G4 formation. Rev7 lacks nuclease activity but inhibits the dsDNA exonuclease activity of Mre11. The C-terminal truncation protein lacking the 42 aa region also showed some inhibition suggesting the involvement of additional binding sites besides the 42 aa region. Also, the Mre11 ssDNA endonuclease activity is inhibited by Rev7 but not the degradation of linear ssDNA. Rev7 does not affect ATP binding by Rad50 but inhibits in a concentration-dependent manner the Rad50 ATPase activity. The C-terminal truncation protein lacking the 42 aa region also showed some inhibition but significantly less than the full-length protein.

      Using an established plasmid-based NHEJ assay, the authors provide strong evidence that Rev7 affects NEHJ, showing a four-fold reduction in this assay. The mutations in the other Pol zeta subunits, Rev3 and Rev1, show a significantly smaller effect (~25% reduction). A strain expressing only the Rev7 C-terminal 42 aa peptide showed no NHEJ defect, while the truncation protein lacking this region exhibited a smaller defect than the deletion of REV7. The conclusion that Rev7 supports NHEJ mainly through the 42 aa region was validated using a chromosomal NHEJ assay. The effect on HR was assessed using a plasmid:chromosome system containing G4 forming DNA. The rev7 deletion strain showed an increase in HR in this system in the presence and absence of HU. Cells expressing the 42 aa peptide were indistinguishable from the wild type as were cells expressing the Rev7 truncation lacking the 42 aa region. The authors conclude that Rev7 suppresses HR, but the context appears to be system-specific and the conclusion that Rev7 abolished HR repair of DSBs is unwarranted and overly broad.

      Strength

      This is a well-written manuscript with many well-executed experiments that suggest that Rev7 inhibits MRX-mediated resection to favor NEHJ during DSB repair. This finding is novel and provides insight into the potential mechanism of how the human Shieldin complex might antagonize resection.

      We thank Reviewer 1 for their comprehensive summary of our work. The Reviewers' recognition that our manuscript is “well-written” with “many well-executed experiments” and our findings are “novel” is greatly appreciated.

      Weaknesses

      The nuclease experiments were conducted using manganese as a divalent cation, and it is unclear whether there is an effect with the more physiological magnesium cation. Additional controls for the ATPase and nuclease experiments to eliminate non-specific effects would be helpful. Evidence for an effect on resection in cells is lacking. The major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified, as only a highly specialized assay is used that does not warrant the broad conclusion drawn. Specifically, the results that the Rev7 C terminal truncation lacking the 42 aa region still suppresses HR is unexpected and unexplained. The effect of Rev7 on G4 metabolism is underdeveloped and distracts from the main results that Rev7 modulated MRX activity. The authors should consider removing this part and develop a more complete story on this later.

      We have addressed each point identified as “Weaknesses” by the reviewer, as described below:

      The nuclease experiments were conducted using manganese as a divalent cation, and it is unclear whether there is an effect with the more physiological magnesium cation.

      We acknowledge the Reviewer’s concern and apologize for not having been clear in our first submission.  However, several studies have demonstrated that Mre11 exhibits all three DNase activities, namely single-stranded endonuclease, double-stranded exonuclease and DNA hairpin opening only in the presence of Mn²⁺ but not with other divalent cations, such as magnesium or calcium (Paull and Gellert, Mol. Cell 1998; 2000; Usui et al., Cell 1998; Ghosal and Muniyappa, JMB, 2007; Arora et al., Mol Cell Biol. 2017). For this reason, Mn²⁺ was used as a cofactor for the Mre11 nuclease assays. We have clarified this in the revised manuscript. As a side note, Mg2+ serves as a cofactor for Rad50’s ATPase activity.  

      Additional controls for the ATPase and nuclease experiments to eliminate non-specific effects would be helpful.

      We thank the Reviewer for raising this important point, as it led us to evaluate and confirm the specificity of Rev7 and exclude its potential non-specific effects. To this end, we have performed additional experiments, which showed that (a) the S. cerevisiae Dmc1 ATPase activity was not affected by Rev7, contrary to its inhibitory effect on Rad50 and (b) Rev7 had no discernible impact on the endonucleolytic activity of S. cerevisiae Sae2, whereas it inhibits DNase activities of Mre11. Thus, the lack of inhibitory effects on the ATPase activity of Dmc1 and nuclease activity of Sae2 confirm the specificity of Rev7 for Mre11 and Rad50 subunits. We have included this new data in Figure 6H and 6J and in Figure 5 –figure supplement 1, respectively, in the revised manuscript.

      Evidence for an effect on resection in cells is lacking. The major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified, as only a highly specialized assay is used that does not warrant the broad conclusion drawn.

      We agree with the Reviewer that in vivo evidence demonstrating the inhibitory effect of REV7 on DNA end resection was lacking in the first submission. Reviewer 2 and 3 have also raised point. We now measured the rate of DNA end resection using a qPCR-based assay (Mimitou and Symington, EMBO J. 2010; Gnugge et al., Mol. Cell 2023). The results revealed that deletion of REV7 led to an enhancement in the rate of DNA end resection at a DSB site inflicted by HO endonuclease (Figure 9—figure supplement 3), providing direct evidence that loss of REV7 contributes to increase in DNA end resection at the DSBs.

      Specifically, the results that the Rev7 C-terminal truncation lacking the 42 aa region still suppresses HR is unexpected and unexplained.

      This is a fair point, and we thank the reviewer for raising it. Although the interaction of Rev7-C1 in the yeast two-hybrid assays was not apparent, surprisingly, it partially suppressed HR (Figure 9). In line with this, biochemical assays showed that it exerts partial inhibitory effect on the Mre11 nuclease (Figure 5) and Rad50 ATPase (Figure 6) activities compared with the full-length Rev7. Consistent with vitro data, the AF2 models revealed that, in addition to the C-terminal 42-aa region, residues in the N-terminal region of Rev7 also interact with the Mre11 and Rad50 subunits (Figure 2—figure supplement 2).

      The effect of Rev7 on G4 metabolism is underdeveloped and distracts from the main results that Rev7 modulated MRX activity. The authors should consider removing this part and develop a more complete story on this later.

      We agree with the reviewer’s comment “that the effect of Rev7 on G4 DNA metabolism is underdeveloped and distracts” from the central theme of the present paper, and suggested that we develop this part as a complete story later. This point has also been raised by Reviewer 2 and 3 and, therefore, Figures and associated text were removed in the revised version of the manuscript.

      Reviewer 2 (Public Review):

      In this study, Badugu et al investigate the Rev7 roles in regulating the Mre11-Rad50-Xrs2 complex and in the metabolism of G4 structures. The authors also try to make a conclusion that REV7 can regulate the DSB repair choice between homologous recombination and non-homologous end joining.

      The major observations of this study are:

      (1) Rev7 interacts with the individual components of the MRX complex in a two-hybrid assay and in a protein-protein interaction assay (microscale thermophoresisi) in vitro.

      (2) Modeling using AlphaFold-Multimier also indicated that Rev7 can interact with Mre11 and Rad50.

      (3) Using a two-hybrid assay, a 42 C terminal domain in Rev7 responsible for the interaction with MRX was identified.

      (4) Rev7 inhibits Mre11 nuclease and Rad50 ATPase activities in vitro.

      (5) Rev 7 promotes NHEJ in plasmid cutting/relegation assay.

      (6) Rev7 inhibits recombination between chromosomal ura3-1 allele and plasmid ura3 allele containing G4 structure.

      (7) Using an assay developed in V. Zakian's lab, it was found that rev7 mutants grow poorly when both G4 is present in the genome and yeast are treated with HU.

      (8) In vitro, purified Rev7 binds to G4-containing substrates.

      In general, a lot of experiments have been conducted, but the major conclusion about the role of Rev7 in regulating the choice between HR and NHEJ is not justified.

      We appreciate Reviewer 2 for comprehensive assessment of our manuscript and their insightful comments. However, we believe that the data (Figure 7-9) in our manuscript, together with new data (Figure 9- figure supplement 2 and 3) in the revised manuscript, clearly demonstrate that Rev7 regulates the choice between HR and NHEJ.

      (1) Two stories that do not overlap (regulation of MRX by Rev7 and Rev7's role in G4 metabolism) are brought under one umbrella in this work. There is no connection unless the authors demonstrate that Rev7 inhibits the cleavage of G4 structures by the MRX complex.

      We agree with the reviewer’s point that the themes associated with the regulation of the functions of MRX subunits by Rev7 and its role G4 DNA metabolism do not overlap. This concern has also been expressed by Reviewer 1 and 3. According to their suggestion, we have deleted all figures and text describing the role of Rev7 in G4 DNA metabolism from the revised manuscript.

      (2) The authors cannot conclude based on the recombination assay between G4-containing 2-micron plasmid and chromosomal ura3-1 that Rev7 "completely abolishes DSB-induced HR". First of all, there is no evidence that DSBs are formed at G4. Why is there no induction of recombination when cells are treated with HU? Second, as the authors showed, Rev7 binds to G4, therefore it is not clear if the observed effects are the result of Rev7 interaction with G4 or its impact on HR. The established HO-based assays where the speed of resection can be monitored (e.g., Mimitou and Symington, 2010) have to be used to justify the conclusion that Rev7 inhibits MRX nuclease activity in vivo.

      We thank the Reviewer for the insightful comments and drawing our attention to the inference "completely abolishes DSB-induced HR". We have we have rephrased the conclusion, and replaced it with “REV7 gene product plays an anti-recombinogenic role during HR”. Then, the reviewer refers to lack of “evidence that DSBs are formed at G4”. At this point, unfortunately, our attempts to identify DSB at the G4 DNA site in the 2-micron plasmid did not provide a clear answer to this question. This might be related to the existence of myriad DNases in the cell and technical issues associated with the isolation of low-abundant, linearized 2-micron plasmid molecules. Because of these reasons, we cannot provide any data on DSB at the G4 site in the 2-micron plasmid.

      The reviewer then correctly points out “Why is there no induction of recombination when cells are treated with HU?” These findings are consistent with previous studies which have shown that Mre11-deficient cells are sensitivity to HU, resulting in cell death (Tittel-Elmer et al., EMBO J. 28, 1142-1156, 2009; Hamilton and Maizels, PLoS One, 5, e15387, 2010). However, a novel finding of our study is that ura3-1 rev7D cells and ura3-1 cells expressing Rev7-42 amino acid peptide (to limited extent) produce Ura3+ papillae. We have included this information in the Results section and adjusted the text to make this point clear to the reader.

      In the same paragraph, the Reviewer expresses a concern about the interaction of Rev7 with G4 DNA substrates and its impact on HR. As discussed above, in response to your comment (1) and a similar comment of Reviewer 1 and 3, we have deleted all figures and text describing the role of Rev7 in G4 DNA metabolism in the revised manuscript. The reviewer specifically refers to a study by Mimitou and Symington, 2010 in which the speed DNA end resection at the HO endonuclease-inflicted DSB was quantified. We have carried out the suggested experiment and the results are presented in Figure 9─figure supplement 3.

      Reviewer 3 (Public Review):

      Summary:

      REV7 facilitates the recruitment of Shieldin complex and thereby inhibits end resection and controls DSB repair choice in metazoan cells. Puzzlingly, Shieldin is absent in many organisms and it is unknown if and how Rev7 regulates DSB repair in these cells. The authors surmised that yeast Rev7 physically interacts with Mre11/Rad50/Xrs2 (MRX), the short-range resection nuclease complex, and tested this premise using yeast two-hybrid (Y2H) and microscale thermophoresis (MST). The results convincingly showed that the individual subunits of MRX interact robustly with Rev7. AlphaFold Multimer modelling followed by Y2H confirmed that the carboxy-terminal 42 amino acid is essential for interaction with MR and G4 DNA binding by REV7. The mutant rev7 lacking the binding interface (Rev7-C1) to MR shows moderate inhibition to the nuclease and the ATPase activity of Mre11/Rad50 in biochemical assays. Deletion of REV7 also causes a mild reduction in NHEJ using both plasmid and chromosome-based assays and increases mitotic recombination between chromosomal ura3-01 and the plasmid ura3 allele interrupted by G4. The authors concluded that Rev7 facilitates NHEJ and antagonizes HR even in budding yeast, but it achieves this by blocking Mre11 nuclease and Rad50 ATPase.

      Weaknesses

      There are many strengths to the studies and the broad types of well-established assays were used to deduce the conclusion. Nevertheless, I have several concerns about the validity of experimental settings due to the lack of several key controls essential to interpret the experimental results. The manuscript also needs a few additional functional assays to reach the accurate conclusions as proposed.

      We are happy that the Reviewer has found “many strengths” in our manuscript and further noted that “results convincingly showed that the individual subunits of MRX interact robustly with Rev7”. We greatly appreciate the Reviewer for these encouraging words, and for specific suggestions that helped us to improve the manuscript. As suggested, we have performed additional experiments including key controls and the data is presented in the revised manuscript.

      (1) AlphaFold model predicts that Mre11-Rev7 and Rad50-Rev7 binding interfaces overlap and Rev7 might bind only to Mre11 or Rad50 at a time. Interestingly, however, Rev7 appears dimerized (Figure 1). Since the MR complex also forms with 2M and 2R in the complex, it should still be possible if REV7 can interact with both M and R in the MR complex. The author should perform MST using MR complex instead of individual MR components. The authors should also analyze if Rev7-C1 is indeed deficient in interaction with MR individually and with complex using MST assay.

      Thank you for the valuable suggestion. As requested, MST titration experiments have been performed to examine the affinity of purified GFP-tagged Rev7-C1 for the Mre11, Rad50 and MR complex. The results revealed that Rev7-C1 binds to the Mre11 and Rad50 subunits with about 3- and 8.8-fold reduced affinity, respectively; whereas it binds to the MR complex with ~5.6-fold reduced affinity compared with full-length Rev7. The data is shown in Figure 1─figure supplement 4A-C.

      (2) The nuclease and the ATPase assays require additional controls. Does Rev7 inhibit the other nuclease or ATPase non-specifically? Are these outcomes due to the non-specific or promiscuous activity of Rev7? In Figure 6, the effect of REV7 on the ATP binding of Rad50 could be hard to assess because the maximum Rad50 level (1 mM) was used in the experiments. The author should use the suboptimal level of Rad50 to check if REV7 still does not influence ATP binding by Rad50.

      We thank the Reviewer for these valuable comments (Reviewer 1 has raised similar issues). Thus, we performed additional control experiments and the results indicate that (a) the ATPase activity of S. cerevisiae Dmc1 was not affected by Rev7 and (b) Rev7 does not inhibit the endonucleolytic activity of S. cerevisiae Sae2. The results are depicted in Figure 6H and 6J and Figure 5 –figure supplement 1A-D, respectively.

      As suggested by the Reviewer, using suboptimal levels of Rad50 (0.2 mM), we carried out experiments to test the effect of varying concentrations of Rev7 on the ability of Rad50 to bind ATP and catalyse its hydrolysis. The results showed that Rev7 had no discernible effect on its ability to bind ATP, even at concentrations 30 times higher than the concentration of Rad50 (Figure 6B and 6D). However, Rev7 suppresses the ATPase activity of Rad50, but not that of Dmc1, in a concentration-dependent manner (Figure G, 6J).  

      (3) The moderate deficiency in NHEJ using plasmid-based assay in REV7 deleted cells can be attributed to aberrant cell cycle or mating type in rev7 deleted cells. The authors should demonstrate that rev7 deleted cells retain largely normal cell cycle patterns and the mating type phenotypes. The author should also analyze the breakpoints in plasmid-based NHEJ assays in all mutants, especially from rev7 and rev7-C1 cells.

      We appreciate the Reviewer's critical and insightful comment. We monitored cell-cycle progression of both wild-type and rev7D cells over time using FACS. The results revealed that the cell cycle profiles and mating type phenotypes rev7D cells were similar to the wild type cells. The data is presented in Figure 7-figure supplement 1. This indicates that rev7D cells do not possess aberrant cell cycle or mating type defects as compared with the wild-type cells.

      We find the second point raised by the Reviewer although is intriguing, its relevance to the current study is unclear. In our view, identification of breakpoints using plasmid-based NHEJ assays in all the mutants will require a significant amount of time, and the insight that we may gain is unlikely to add to the central theme of this paper.  Moreover, we know for sure that Rev7 has no DNA cleavage/nicking activity.

      (4) It is puzzling why the authors did not analyze end resection defects in rev7 deleted cells after a DSB. The author should employ the widely used resection assay after a HO break in rev3, rev7, and mre11 rev7 cells as described previously.

      Thank you for the suggestion. Reviewer 1 also has raised this point. As suggested, we have analysed end resection in the rev7D cells at a HO inflicted DSB site using a qPCR assay (Mimitou and Symington, EMBO J. 2010; Gnugge et al., Mol. Cell 2023). The results revealed that deletion of REV7 led to an enhancement in the rate of DNA end resection at a DSB inflicted by HO endonuclease (Figure 9—figure supplement 3),

      (5) Is it possible that Rev7 also contributes to NHEJ as the part of TLS polymerase complex? Although NHEJ largely depends on Pol4, the authors should not rule out that the observed NHEJ defect in rev7 cells is due at least partially to its TLS defect. In fact, both rev3 or rev1 cells are partially defective in NHEJ (Figure 7). Rev7-C1 is less deficient in NHEJ than REV7 deletion. These results predict that rev7-C1, rev3 should be as defective as the rev7 deletion. Additionally, the authors should examine if Rev7-C1 might be deficient in TLS. In this regard, does rev7-C1 reduce TLS and TLS-dependent mutagenesis? Is it dominant? The authors should also check if Rev3 or Rev1 are stable in Rev7 deleted or rev7-C1 cells by immunoblot assays.

      We agree with the possibility that Rev7 may play a role in translesion DNA synthesis and TLS-dependent mutagenesis. Accordingly, Rev7-C1 might be deficient in TLS. While we do not rule out such scenarios, we respectfully suggest that this is outside the scope of the current manuscript. This manuscript focuses on the role of Rev7 in NHEJ and HR pathways, not on translesion DNA synthesis. Nevertheless, we recognise the importance of this line of investigation, and we will certainly consider this suggestion in our future work. Thank you.

      (6) Due to the G4 DNA and G4 binding activity of REV7, it is not clear which class of events the authors are measuring in plasmid-chromosome recombination assay in Figure 9. Do they measure G4 instability or the integrity of recombination or both in rev7 deleted cells? Instead, the effect of rev7 deletion or rev7-C1 on recombination should be measured directly by more standard mitotic recombination assays like mating type switch or his3 repeat recombination.

      We appreciate the Reviewer for highlighting this important point and would like to take the opportunity to clarify the rationale behind plasmid-chromosome recombination assay, as previously described (Paeschke et al., Cell 145, 678, 2011). In this assay, we are measuring the rate of Ura+ papillae formation arising from integration of the targeting plasmid into the genome at the ura3-1 locus of wild-type and rev7D cells. Analysis of PCR-generated DNA fragments indicate that pFAT10-G4 plasmid integrates at the ura3-1 genomic locus of rev7D cells, but not in the wild-type cells (Figure 9-figure supplement 2). Further, we also measured the stability of G4 DNA and the results indicate that it is stable in rev7D cells.

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      (1) Title: The word 'choice' implies a regulator. Is that the model here? Alternatively, is it pathway properties that define the preference of usage?

      This is an excellent suggestion. In the revised submission, we rephrased the title “Saccharomyces cerevisiae Rev7 promotes non-homologous end-joining by inhibiting Mre11 nuclease and Rad50 ATPase activities and Homologous recombination.”

      (2) Line 83, Introduction: Titia De Lange proposed an alternative/complementary model for Shieldin and REV7 to support fill-in by DNA polymerases including Pol alpha. This should be discussed.

      We thank the reviewer for pointing out that we have not discussed the work from Titia De Lange’s research group. We have now added new sentences to the Introduction to describe the alternative model involving Polα-primase fill-in synthesis (p3.2.7).

      (3) Line 131: The paragraph title needs to change. 2-hybrid assays cannot establish direct interaction especially when analyzing yeast proteins by yeast 2-hybrid. I agree that direct interaction is established by other means later.

      Per the Reviewer’s suggestion, we have deleted the word “directly” from the title of the paragraph.

      (4) Figure 1 D-F: The purity of the Rev7-GFP fusion is shown in Figure S1, and the purity of the Rad50, Mre11, and Xrs2 subunits as assessed by PAGE should be shown as well.

      Following this suggestion, we have included images of Coomassie blue-stained SDS-polyacrylamide gels (Figure 1-figure supplement 1), which show the purity and size of GFP tagged Rev7, Rad50, Mre11, Xrs2, Rev1, Sae2 and Dmc1 proteins.

      (5) Please check the Kd values. In the graph in D, the differences between Rad50, Mre11, and Xrs2 look much larger than the values in F suggest.

      This is a fair point and we appreciate the reviewer for highlighting. The differences between the binding profiles of the Rad50, Mre11, and Xrs2 with Rev7 as shown in the previous version of the manuscript were not obvious because of cluttering of binding curves. Therefore, the binding profiles of interacting pair of proteins were plotted separately to highlight the differences (Figure 1—figure supplement 3). Further, we rigorously analysed the dataset to ascertain the binding affinities and found that the Kd values obtained were in good agreement with the values shown in Figure 1D.

      (6) Figure 1S3: Please label the bands.

      In the revised manuscript, the protein bands in Figure1-figure (previously Figure 1S3) are identified with their names.

      (7) Line 195: Change Figure 1 to Figure 1S4.

      We have introduced the correction in the revised manuscript.

      (8) Line 202: The minimal interaction domain of 42 aa is only described in the next paragraph. The description anticipates a result about the 42 aa fragment that has not been shown to this point. Please reorder results or descriptions to make this coherent.

      We have implemented the change, as per the Reviewer’s suggestion.

      (9) Figure 2: The two-hybrid analysis in Figures 1 and 2 also identifies Rev7 self-interaction, which is not discussed. This serves as another control against the artifact of the truncation proteins and should be discussed.

      We have now discussed the significance of Rev7 self-interaction in the Y2H experiments wherever relevant in the text.

      (10) Is the 42 aa fragment sufficient to elicit a two-hybrid signal?

      We thank the reviewer for this insightful comment. To test this premise, we expressed the terminal 42 amino acid sequence of Rev7 using bait pGBKT7 vector. The results revealed that the 42 residue fragment of ScRev7 alone is sufficient for a two-hybrid interaction with the MRX subunits (Figure 2-figure supplement 1).

      (11) Line 289: Why are the EMSA conditions described as physiological? As per Material and Methods, the reaction mixtures contain 20 mM Tris-HCl (pH 7.5), 0.1 mM DTT, 0.2 mg/ml BSA, and 5% glycerol, which is far from physiological.

      As suggested by all three reviewers, the data showing the interaction of Rev7 and its truncation derivative Rev7-C1 with G4 DNA has been deleted in the revised version of the manuscript.

      (12) Figure 4C: The figure needs to increase in size. The plotting symbols are not all visible, and it is undefined what the black squares represent.

      Following the reviewer's suggestion, Figure 4C has been omitted in the revised version of the manuscript.

      (13) Figure 5: The MRX nuclease assays were conducted in the presence of Manganese. Has the more physiological divalent cation magnesium been tested?

      This has been addressed in response to the query of Reviewer 1 (Public Review). As noted above, Mre11 exhibits DNase activities only in the presence of Mn²⁺.

      (14) In Figure 5D, lane 2: What is the concentration of Rev7?

      We appreciate the reviewer for catching this. The concentration of ScRev7 used for the reaction shown in Figure 5D, lane 2 was 2 μM, as specified in the Figure legend.

      (15) Figure 6 legend: Lane 1620 "same as in lane "Is there a "1" missing?

      We thank the reviewer for pointing out the typographical error, which has been corrected in the revised manuscript.

      (16) Figure 9: Rev7-C1 lacks the 42 a peptide that is postulated to mediate anti-resection but shows normal HR here. This seems unexpected based on the premise that the 42 aa fragment supports end-joining. Rev7 seems to suppress HR independent of the function of the 42 aa peptide.

      This has been addressed in response to the query posed by Reviewer 1 in the Public Review. We do see that the Rev7-C1 lacking the 42 aa peptide suppresses HR, but the suppression was only partial as compared with the wild type. This is consistent with biochemical assays suggesting that Rev7-C1 exerts partial inhibition on the Mre11 nuclease (Figure 5) and Rad50 ATPase (Figure 6) activities. Further, the AF2 models indicate that, in addition to the C-terminal 42-aa region, other regions of Rev7 also interact with the Mre11 and Rad50 subunits (Figure 2—figure supplement 2), consistent with biochemical and genetic data.

      (17) Line 478: The conclusion that "these findings are consistent with the idea that REV7 completely abolishes DSB-induced HR in S. cerevisiae." is overly broad as the assay

      We agree with the reviewer's assessment. Accordingly, we have rephrased the sentence to soften the claim.

      Line 483ff: Based on the comments on Figure 9, the introductory sentences of the discussion do not seem to be supported by the data, as Rev7 appears to regulate HR independent of the 42 aa peptide.

      Please refer to the response of comment #16 above

      (18) Line 536: Similarly to above 17, the conclusion about the effect of the 42 aa peptide on HR appears unwarranted.

      We have revised the statement to moderate the previously exaggerated claims.

      (19) In all figures, please list in the legend, which exact strains have been used referring to Table S5.

      We have now included mentions of the strains in the figure legend wherever applicable.

      (20) Line 351: linear.

      It is corrected in the revised manuscript.

      Reviewer 2 (Recommendations For The Authors):

      (1) It is very strange and unusual that Rev7 independently binds to all three subunits of the MRX complex, raising a question of how specific these interactions are. At least, it should be a negative control in their YH2 assay and protein-protein interaction assay in vitro that Rev7 does not bind to some other proteins. For example, Sae2 and Rev7 interactions can be tested.

      The reviewer is right that it is important to validate the specificity of Y2H interactions as well as in vitro enzyme assays. These findings are shown in Figure 6 and Figure 5-figure supplement 1.  As suggested by the Reviewer, we included SAE2 in Y2H and MST assays, and Dmc1 and Sae2 in vitro enzyme assays. Our results clearly showed that Sae2 neither interacts with MRX subunits in Y2H assays (Figure 1A-C) nor inhibits the Sae2’s nuclease and Dmc1’s ATPase activities in vitro (Figure 6 and Figure 5-figure supplement 1)

      (2) It is surprising that in the Discussion the authors speculate that Rev7 might recruit Mus81 nuclease for cleavage, completely ignoring their own publication on the cleavage of G4 by MRX.

      We agree with the reviewer, and we have added discussion about MRX (mentioned above by the reviewer) in revised version.

      (3) How does the AlphaFold-Multimer modeling predict the interaction between Rev7 and MRX as a complex? Are the same regions of MRX accessible for the interaction with Rev7 in this case? Similarly, how are the activities of the MRX complex and phosphorylated Sae2 (see P. Cejka's work) affected by Rev7?

      Thank you for pointing this out. In this study, we investigated the interaction between Rev7 and Mre11, and between Rev7 and Rad50 subunits using AF2 algorithm. However, the three-dimensional structure of S. cerevisae MRX-Rev7 complex could not be constructed due to the size limits imposed by AF2 algorithm. Therefore, we are unable to comment on whether the same regions of MRX subunits in the complex are accessible for the interaction with Rev7. That said, AF2 algorithm has recently been used for structural modelling of S. cerevisiae Mre11 (1–533)-Rad50 (1–260 + 1,057–1,312) complex (Nicolas et al., Mol. Cell 84, 2223, 2024). As such, there are no AF2 structural models that cover the whole length of Mre11-Rad50 proteins.

      Regarding the second point raised by the Reviewer, our results suggest that Rev7 does interact with Sae2 in Y2H assays. However, whether phosphorylated Sae2 could potentially affect the interaction between MRX subunits and Rev7 warrants further studies.

      Minor points:

      (1) Figure 1. The labeling of the strains in A and B is genes and in C is proteins.

      The reviewer is correct. We have now corrected the error in the Figure 1 and 2.

      (2) Abstract. Carefully check English grammar.

      We thank the Reviewer for spotting this, which has been corrected in the revised manuscript.

      (3) Line 322 "Further, it has been demonstrated that Mre11 cleaves non-B DNA structures such as DNA hairpins, cruciforms and intra- and inter-molecular G-quadruplex structures)." It has not been shown that Mre11 cuts cruciform structures.

      We thank the referee for spotting this error. Mre11 does not cleave cruciform DNA structures. This error is corrected in the revised manuscript.

      (4) Page 14. Lines 452-455. What does "selective and non-selective media" mean? Is it without and with HU treatment?

      Thanks very much for the comment. In our manuscript, selective medium is composed of SC/-Leu with HU and non-selective medium is without HU. We have clarified this point in the revised version.

      (5) Page 15. Lane 472 "To assess whether increased frequency of HR is due to the instability of G-quadruplex DNA in rev7Δ cells, we examined the length of G4 DNA inserts in the plasmids carrying sequences during HR assay". It is not clear what does mean" during HR assay"? Did you examine the presence of G4 in Ura+ recombinants? If not, this analysis is meaningful.

      The reviewer is correct. We measured the presence of G4 DNA insert in Ura+ recombinants. The text has been appropriately edited to reflect these necessary modifications.

      (6) What is the nature of the ura3-1 allele? Can it revert to URA3 in rev7 mutants?

      The ura3-1 allele (glycine-to-glutamate substitution) reverts to Ura3+ at a low rate of ~2.5 × 10−9 in both orientations (Johnson et al., Mol. Cell 59, 163, 2015)

      (7) From the way that the recombination process is depicted it seems that the authors believe that plasmid should integrate into the chromosome. In reality, in most cases it should be a gene conversion where the G4 sequence (if it indeed induces DSBs) should be replaced by the wild-type segment form ura3-1, integration is not required since it is 2-micron plasmid.

      We apologize for not having made this clearer. The recombination assay with targeting plasmids containing G4 DNA forming sequences was performed as previously described (Paeschke et al., Cell 145, 678, 2011). In this assay, the appearance of Ura+ recombinants arise from the integration of the targeting plasmid bearing ura3G4 allele (with a G4 DNA forming insert) integrates into the genome at the ura3-1 locus. As shown in Author response image 1B, this is confirmed by PCR amplification of the insert in the genomic DNA of wild type and rev7D cells.

      Reviewer 3 (Recommendations For The Authors):

      (1) All Y2H experiments were performed with REV7 fusion to pGBKT7 and MRX to pGADT7. It will be helpful to test if pGAD-Rev7 also interacts with pGBK-Mre11 or Rad50 by Y2H.

      Following the reviewers' suggestions, we performed Y2H experiments in wild-type PJ69-4a cells co-transformed with the pGBKT7 vector expressing MRX subunits and the pGADT7 vector expressing Rev7. The results indicated that Rev7 interacts with Mre11, Rad50 or Xrs2 subunits, indicating that interactions are vector-independent.

      Author response image 1.

      Yeast two hybrid analysis suggest interaction between Rev7 and MRX subunits. PJ69-4A cells were co-transformed with bait vector expressing Rev7 or the Mre11, Rad50 or Xrs2 subunits and prey vector expressing Rev7 protein. Equal number of cells were spotted onto –Trp – Leu and –Trp – Leu –His dropout plates containing 3-AT and images were obtained following 48 h of incubation at 30°C. The data is representative of three independent experiments.

      (2) G4 studies are under-developed and do not add much or even negatively to the manuscript. The author might consider revising the manuscript to improve their integration with better rationales or logic. Alternatively, the authors should consider removing the G4 part for another paper.

      This concern was also raised by Reviewer 1 and 2. Following the suggestions of all reviewers, figures and text related G4 DNA studies have been deleted in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1: IPA analysis was performed after scRNA-seq. Although it is knowledge-based software with convenient graphic utilities, it is questionable whether an unbiased genome-level analysis was performed. Therefore, it is not convincing if WNT is the only and best signal for the branching-off marker. Perhaps independent approaches, such as GO, pathway, or module analyses, should be performed to validate the finding.

      Thanks for your comment. We agree with the reviewer that IPA is a knowledge-based and a hypothesis-driven method. Our hypothesis was that WNT/BMP pathways, among others, are heavily involved in the development of mesenchymal tissues in general and differentiation of tendons specifically. Therefore, we have looked at differentially expressed genes between clusters from a broad array of pathways featured in IPA that could point us towards molecular function that could make a difference. We further corroborated this hypothesis by using WNT inhibitors in subsequent experiments. To address this point, we have supplemented the discussion section with the following remark:

      “This study is not without limitations. The IPA network analysis is a knowledge-based and hypothesis driven platform. We have specifically targeted known pathways to be involved in syndetome differentiation. However, WNT signaling stood out with very specific affinity to the off-target populations and we have verified our findings with experiments proving this hypothesis.”

      Per the reviewer’s suggestion, we also performed a non-biased GO analysis (Supp. Fig. 6). Multiple pathways were detected in the three clusters of interest (Supp. Fig. 6A-C), including integrin-related and TGFβ-related pathways. However, in these three clusters of interest, WNT signaling was also detected as a prominent pathway. Therefore, we could conclude that it plays a pivotal role in the differentiation process. This hypothesis was later corroborated with WNT inhibitor experiments.

      Comment 2: According to the method section, two iPSC lines were used for the study. However, throughout the manuscript, it is not clearly described which line was used for which experiment. Did they show similar efficiency in differentiation and in responses to WNTi? It is also worrisome if using only two lines is the norm in the stem cell field. Please provide a rationale for using only two lines, which will restrict the observation of individual-specific differential responses throughout the study.

      Thanks for your comment. This proof-of-concept study is the first investigation that compares data of an in vitro tenogenic induction protocol that has been tested in more than one human iPSC lines. We agree that line-specific phenomena are difficult to interpret and reproduce. Therefore, it is critical to provide data supporting that the findings can be reproduced in more than one line. Some early studies used one line as proof of concept, however now we realize the need to show that the protocol works in at least one additional line.

      Here we used the GMP-ready iPSC line CS0007iCTR-n5 for all optimization experiments. This newer low passage feeder-free line was generated from PBMCs and was designated as GMP-ready in the manuscript because it has been derived and cultured using cGMP xeno-free components (mTESR plus medium and rhLaminin-521 matrix substrate instead of Matrigel). We then wanted to confirm the application of the optimized protocol using the reference control line CS83iCTR-22n1 which has already been more widely used by our group1-5 and others.6 This line has been derived from fibroblasts and has been grown and expanded using MatrigelTM and mTESR1, followed by mTESR plus media. 

      The question of number of lines needed is stage-dependent. In our opinion at the proof-of-concept level, two lines, one of which has been generated in GMP-like conditions is sufficient. Confirmation with multiple lines becomes more pertinent as we move towards scale-up/manufacturing, where considerations regarding robustness and consistency are raised. However, at this stage, it is crucial to understand the developmental processes that are involved in cell differentiation to ensure a more robust protocol can be modified and adapted later. In future studies, as we move towards clinical translation, it is warranted that the approach presented in this work will be further optimized and subsequently evaluated using at least 3 different cell lines that have been generated from various sources.

      Comment 3: How similar are syndetome cells with or without WNTi? It would be interesting to check if there are major DEGs that differentiate these two groups of cells.

      Thanks for your comment. Single cell RNAseq analysis revealed that treatment with WNTi upregulated tenogenic markers. In SYNWNTi, the expression levels of stage-specific markers COL1A1, COL3A1, SCX, MKX, DCN, BGN, FN1, and TNMD were higher compared to the untreated SYN group, as shown in Figure 5C. Density plots depicted an increase in the number of cells expressing COL1A1, COL3A1, SCX and TNMD in SYNWNTi compared to the SYN group, as illustrated in Figure 5D. Trajectory analysis of the WNTi-treated group revealed the absence of bifurcations observed in the untreated group (Fig. 5E). Therefore, it can be conjured that syndetome cells with and without WNTi are different.

      Comment 4: Please discuss the improvement of the current study compared to previous ones (e.g., PMID 36203346 my study, 35083031- Tsutsumi, 35372337- Yoshimoto).

      Thanks for your comment. In Papalamprou et al (2023)3, we differentiated iPSCs to mesenchymal stromal-like cells (iMSCs), which were then cultured into a 2D dynamic bioreactor for 7 days. In that study, we examined the impact of simultaneous overexpression of the tendon transcription factor Scleraxis (SCX) using a lentiviral vector and mechanical stimulation on the process of tenogenic differentiation. Following 7 days of uniaxial cyclic loading, we observed notable modifications in the morphology and cytoskeleton organization of iPSC-derived MSCs (iMSCs) overexpressing SCX. Additionally, there was an increase in extracellular matrix (ECM) deposition and alignment, along with upregulation of early and late tendon markers. This proof-of-concept study showed that iPSC-derived MSCs could be a viable cell candidate for cell therapy applications and that mechanical stimulation is contributing to the differentiation of iMSCs towards the tenogenic lineage.

      Similarly, Tsutsumi et al7 overexpressed the tendon transcription factor Mohawk (MKX) stably in iPSC-derived MSCs using lentiviral vectors. These cells were then used to seed collagen hydrogels which were mechanically stimulated in a cyclic stretch 3D culture bioreactor for 15 days to create artificial tendon-like tissues, which the authors termed “bio-tendons”. Bio-tendons were then decellularized to remove cellular remnants from the xenogeneic human iPSC-derived cells and were subsequently transplanted in an in vivo Achilles tendon rupture mouse model. The authors reported improved histological and biomechanical properties in the Mkx-bio-tendon mice vs. the GFP-bio-tendon controls, providing another proof-of-concept study in favor of the utilization of iPSC-derived MSCs for tendon cell therapies, while also addressing the immunogenicity of cells of allogeneic/xenogeneic origin. Therefore, the above two studies used tendon transcription factor overexpression and mechanical loading either in 2D or 3D to differentiate MSCs towards the tendon/ligament lineage.

      Yoshimoto et al8 optimized a stepwise iPSC to tenocyte induction protocol using a SCX-GFP transgenic mouse iPSC line, by monitoring GFP expression over time. The group performed scRNA-seq to characterize the induction of mesodermal progenitors towards the tenogenic lineage and to shed light into their developmental trajectory. That study unveiled that Retinoic Acid (RA) signaling activation enhanced chondrogenic differentiation, which was in contrast to the study of Kaji et al (2021), which also used a SCX-GFP mouse iPSC line. Kaji et al inhibited TGF and BMP signaling during the process of mesodermal induction and reported that RA signaling eliminated SCX induction entirely and promoted a switch to neural fate. Yoshimoto et al suggested that variations in mesodermal cell identity could be due to the different methods used for mesodermal differentiation. In contrast to the Kaji et al study, Yoshimoto et al opted to stimulate WNT and block the Hedgehog pathway during mesoderm induction. Loh et al (2016) identified the branchpoint from the primitive streak to either the paraxial mesoderm (PSM) or the lateral plate mesoderm (LPM) as the result of two mutually exclusive signaling conditions. Specifically, they reported that induction of PSM was achieved through BMP suppression and WNT stimulation, while the specification of lateral mesoderm was accomplished by BMP stimulation and WNT suppression, all with concurrent TGFβ suppression/FGF stimulation. Lastly, a similar approach towards PSM induction from primitive streak (TGF off/BMP off/ WNT on/FGF on) has been used by many subsequent studies Matsuda et al (2020),9 Wu et al (2021)10 and Nakajima et al (2021).11 The diversity of the above-mentioned approaches points to the plasticity of mesodermal progenitors and the need for additional studies to better understand mesodermal specification and subsequent induction towards sclerotome and syndetome.   

      In the current study we optimized a stepwise differentiation protocol using xeno-free cGMP ready media and two different cell lines, one of which was cGMP-ready. We used scRNA-seq to characterize the differentiation, which led us to identify off-target cells that were closer to a neural phenotype. We performed pathway analyses and hypothesized that WNT signaling activity might have contributed to the emergence of the off-target cells. To test this, we used a WNT inhibitor (PORCN) to block WNT activity at the SCL stage and at the SYN stage. We found that blockade of WNT signaling at the end of the SM stage and during SCL and SYN induction resulted in a more homogeneous population, while eliminating the neural-like cell cluster. This is the first study that utilized scRNA-seq to shed light into the developmental trajectory of stepwise iPSC to tendon differentiation of human iPSCs and provided a proof-of-concept for the generation of a more homogeneous syndetome population. Further studies are needed to further fine-tune both the process and the final product, as well as elucidate the functionality of iPSC-derived syndetome cells in vitro and in vivo.

      Reviewer 2:

      General concerns: The authors demonstrated the efficiency of syndetome induction solely by scRNA-seq data analysis before and after pathway inhibition, without using e.g. FACS analysis or immunofluorescence (IF)-staining based assessment. A functional assessment and validation of the induced cells is also completely missing.

      We appreciate and agree with the reviewer’s critique regarding further analyses of differentiated iPSC-derived syndetome-like cells, including functional assessment of the differentiated cells. Immunofluorescence was used at all timepoints of induction for phenotype confirmation (Fig. 2,4). Flow cytometry for DLL1 was utilized to benchmark efficient differentiation to PSM (Loh et al,12 Nakajima et al11. Specifically, DLL1 expression was assessed with flow cytometry after 4 days of induction, and was used to optimize the parameter of initial iPSC aggregate seeding density, which has been previously found to be crucial for in vitro differentiation protocols (Loh et al12). Unfortunately, this parameter is usually not reported although it could be critical to establish protocol replication between different lines.

      The function of tendon progenitors is usually reported as response to mechanical cues and the ability to regenerate tendon injuries. In future studies we intend to assess the functionality of the generated syndetome and tendon progenitors and their response to in vitro biomechanical stimulation as previously reported to iMSCSCX+ cells3, 13 and in vivo in a critical tendon defect  similarly to what has been previously reported.2 

      Comment 1: Notably, in Figure 1D, certain PSM markers (TBXT, MSGN1, WNT3A) show higher expression on day 3. If the authors initiate SM induction on day 3 instead of day 4, could this potentially enhance the efficiency of syndetome-like cell induction?

      Thanks for your comment. In the current work, we initially optimized differentiation to PSM via expression of DLL1, whose gene expression peaked at d4. We found that this was influenced by the initial iPSC aggregate seeding density. We wanted to generate a homogeneous DLL1+ population which we assessed via gene expression, flow cytometry, IF and scRNA-seq (Fig. 1D, 2C, 3C and Suppl. Fig.1). Given the fact that different lines might display a diverse developmental timeline, we also confirmed reproducibility of the protocol with a second cell line. We appreciate the reviewer’s suggestion to investigate additional protocol iterations, such as the proposed one at the PSM stage, as we move towards a better understanding of key developmental events during in vitro induction.

      Comment 2:  In the third paragraph of the result section the authors note, "Interestingly, SCX, a prominent tenogenic transcription factor, was significantly downregulated at the SCL stage compared to iPSC, but upregulated during the differentiation from SCL to SYN." Despite this increase, the expression level of SCX in SYN remains lower than that in iPSCs in Fig.1G and Fig.3C. Can the authors provide an explanation for this? Can the authors provide IF data using iPSCs and compare it with in vitro-induced SYN cells? Can the authors provide e.g. additional scRNA-seq data which could support this statement?

      Thank you for your comment. In Fig. 1G, SCX expression in SYN was upregulated compared to SCL, however, it was shown to be similar to iPSCs. This suggests a baseline stochastic expression of SCX possibly stemming from spontaneous differentiation of iPSCs in culture (Fig. 3C). Previous research has shown that tenogenic marker gene expression tends to reduce during postnatal tendon maturation (Yin et al., 2016b14 Grinstein et al., 2019.15 Yoshimoto et al (2022) utilized a transgenic mouse iPSC-SCX-GFP line  to track SCX expression. It was shown that SCX expression peaked after 7d of tenogenic induction and was then decreased at day 14, which marked the end of tenogenic induction. The authors postulated that this pattern of gene expression could either indicate further maturation of tenocytes at subsequent time points, or that the number of non-tenogenic cells increased from T7 to T14.

      In the present work, we showed SCX gene expression upregulation in SYN compared to SCL, as well as significant upregulation of TNMD, EGR1, COL1A1 and COL3A1 (Fig.1G). Supp. Fig.8 has been added to show feature plots of SCX and TNMD expression from SCL, SYN and SYNWNTi.  The significant upregulation of later markers of tenogenic differentiation suggests that the 21 days of tenogenic induction might have matured the cells. Since gene expression analysis only conveys a snapshot of the transcriptional profile of a cell population, it is likely that we might have missed the peak of SCX upregulation (Supp. Fig. 5). Following treatment with the WNT inhibitor, the SYNWNTi group displayed increased SCX expression (% cells expressing SCX) compared to SYN, which might also be due to a more homogeneous population of syndetome-like cells following treatment with WNTi. In the SYNWNTi group, TNMD was shown to be expressed in the SYN cluster, whereas SCX was mostly found in the cluster that was labelled as fibrocartilage (FC) cluster based on the expression of COL2A1/SOX9/FN1/BGN/COL1A1 markers. Due to the fact that SCX+/SOX9+ progenitor cells are able to give rise to both tendon and cartilage (Sugimoto 2013)16, it could be postulated that this cluster contains tendon progenitors. Interestingly, the FC cluster was not observed in the second iPSC line that we tested, which resulted in a more homogeneous induction to syndetome (78.5% vs. 66.9% SYN cells, Supp. Table 1 & Supp. Fig.3). This slight discrepancy between the two lines and more specifically the presence of the FC cluster only in the 007i line, warrants further investigation. Taken together, these data indicate that the tenogenic induction duration could likely be shortened. Further work to assess the time course of SCX expression over the entire tenogenic induction could be used to further optimize the in vitro induction. For instance, a human edited iPSCSCX-GFP+ line could be generated and used to track SCX expression during the entire induction.

      Comment 3: In the fourth paragraph of the result section the authors state, "SM markers (MEOX1, PAX3) and SCL markers (PAX1, PAX9, NKX3.2, SOX9) were upregulated in a stepwise manner." However, the data for MEOX1 and NKX3.2 seems to be missing from Figure 3B-C. The authors should provide this data and/or additional support for their claim.

      Thanks for your comment. Feature plots for MEOX1 and NKX3.2 have been added to the Supplemental information (Supp. Fig. 9).

      Comment 4: In Figures 2B and 2E, the background of the red channel seems extremely high. Are there better images available, particularly for MEOX1? Given the expected high expression of MEOX1 in SM cells, the authors should observe a strong signal in the nucleus of the stained somitic mesoderm-like cells, but that is not the case in the shown figure. The authors should provide separate channel images instead of merged ones for clarity. The antibody which the authors used might not be specific. Can the authors provide images using an antibody which has been shown to work previously e.g. antibody by ATLAS (Cat#: HPA045214)?

      As requested by the reviewer, we have provided separate channels for those images in the Supplement (Supp. Fig. 7). The images show relatively high expression of these markers in SM cells.

      Comment 5: In Fig. 2C and Supplementary Fig. 1, the authors present data from immunofluorescence (IF) staining and FACS analysis using a DLL1 antibody. While FACS analysis indicates an efficiency of 96.2% for DLL1+ cells, this was not clearly observed in their IF data. How can the authors explain this discrepancy? Could the authors quantify their IF data and compare it with the corresponding FACS data?

      Thanks for your comment. We performed flow cytometric analysis of DLL1 expression to optimize cell seeding density using the 007i line. In the present study, we used IF only in a qualitative manner, that is to confirm protein expression of selected markers. It could be noted that the use of poly-lysine coated coverslips, which are needed for IF, might have slightly altered the density of the cells on the coverslip vs. the plate. Lastly, it cannot be ruled out that the different substrate could have influenced their phenotype differentially through matrix interactions and signaling. On the other hand, flow cytometry by nature is a quantitative and single cell approach, whereas IF staining is qualitative. Therefore, for the purpose of this proof-of-concept work, we tend to trust the quantitative data from the flow cytometry results more than semi-quantitative confirmation achieved through IF staining using coverslips. 

      Comment 6: In Fig. 2G, PAX9 is expected to be expressed in the nucleus, but the shown IF staining does not appear to be localized to the nucleus. Could the authors provide improved or alternative images to clarify this? The authors should use antibodies shown to work with high specificity as already reported by other groups.

      Thanks for your comment. Indeed, the staining seems to be mostly cytoplasmic. We have used antibodies that were previously reported3 and repeated the staining, however, the same results were replicated. We can speculate that this transcription factor has additional role in the iPSC-derived cells and might be traveling to the cytoplasm. Unfortunately, we have no evidence to this phenomenon.  

      Comment 7: Why did the authors choose to display day 10 data for SYN induction in Fig. 4A? Could they provide information about the endpoint of their culture at day 21?

      Thank you for your comment. In Fig. 1G we provided gene expression analyses results for several selected early and later tendon markers for the endpoint of our culture, that is day 21. Following scRNA-seq at each stage of the differentiation (iPSC at d0, PSM at d4, SM at d8, SCL at d11 and the endpoint day 32 for SYN), we performed DEG analysis using the IPA platform. We identified activation of genes associated with the WNT signaling pathway in the off-target clusters. We hypothesized that WNT pathway inhibition might block the formation of unwanted fates and induce a more homogeneous differentiation outcome. We thus tested a WNT inhibitor and compared the inhibitor-treated group with a non-treated group. We then assessed selected neural markers during the course of the inhibitor application. In Fig. 4A we presented gene expression of key selected markers at day 21 using qPCR, which was approximately in the middle of the syndetome induction. Since we observed that the inhibitor downregulated the selected neural markers, we then applied the inhibitor until the endpoint of the initial induction and proceeded to analyze the results using scRNA-seq (Fig. 5). Lastly, it should be acknowledged that this was a proof-of-concept study, and additional optimizations are needed regarding the application of the inhibitor (timing, duration, concentration, etc).

      Comment 8: In Supplementary Fig. 5, the authors depicted the expression level of SCX, a SYN marker, which peaked at day 14 and then decreased. By day 21, it reached a level comparable to that of iPSCs. Given this observation, could the authors provide a characterization of the cells at day 21 during SYN induction using IF? What was the rationale behind selecting 21 days for SYN induction? The authors also need to show 'n numbers'; how many times were the experiments repeated independently (independent experiments)?

      Thanks for your comment. During the optimization process, we initially used RT-qPCR to track gene expression of selected tenogenic markers using the 007i line. We found that after 21 days of tenogenic induction there was upregulation of the few established tendon markers, that is COL1A1, COL3A1, EGR1 and quite importantly, the more definitive later tendon marker, TNMD. Thus, we decided to proceed with this protocol prior to testing other compounds including the WNT inhibitor WNT-C59. However, as has been discussed in the manuscript, this extended tenogenic induction resulted in cell attrition without the application of the WNT inhibitor. This phenomenon was ameliorated following WNT inhibition. Thus, it could be postulated that the protocol could be further optimized by shortening tenogenic induction to less than 21 days.

      The experiments that were conducted to optimize the differentiation process were repeated independently at least n=3 times using qPCR and IF using two lines, that is the 007i and the 83i line as described in the manuscript. The scRNAseq analysis represents a population of cells from in vitro differentiation that originated from the same donor line, therefore it was performed on n=1 sample at each stage. However, the effects of inhibitor application (sample SYNWNTi) were also confirmed using a second cell line (83i), thus a total of n=2 independent samples were analyzed.  

      Comment 9: Overall the shown immunofluorescence (IF) data does not appear convincing. Could the authors please provide clearer images, including separate channel images, a bright field image, and magnified views of each staining?

      Thanks for your comment. The separate channels images were added to the supplemental data (Supp. Fig. 7). We agree with the reviewer regarding the limitations of IF staining, especially with the added confounding factor of using poly-lysine coated coverslips. We would like to point out, that in the current work IF staining is not the main finding or the primary outcome measure, and that it is only used to further support the differentiation by providing a qualitative assessment of protein presence and localization. We describe in this paper our thesis regarding the limitations of IF and the need for more high-throughput unbiased approaches to quantification when using IF staining. For instance, spatial transcriptomics combined with mass cytometry or flow cytometry could be used for a more unbiased approach. Thus, in the present manuscript we based our conclusion on the quantitative gene expression, single cell sequencing and flow cytometry.

      Comment 10: As stated by the authors in the manuscript, another research group performed FACS analysis to assess the efficiency of syndetome induction using SCX antibody, and/or quantification of immunofluorescence (IF) with SCX, MKX, COL1A1, or COL2A1 antibodies. Could the authors conduct a comparative analysis of syndetome induction efficiency both before and after protocol optimization, utilizing FACS analysis in conjunction with an SCX reporter line or antibody staining, e.g. quantifying induction efficiency via immunofluorescence (IF) staining with syndetome-specific marker genes?

      Thank you for your comment. As discussed in a previous comment, we agree with the reviewer that the generation of a human iPSC-SCX-GFP line would shed light into SCX expression over the entire course of induction. In the current work we used IF as qualitative confirmation of specific marker expression and we showed the presence of SCX, MKX, COL1 and COL3 in SYNWNTi as well as the absence of neuronal markers. As we also pointed it out in the present manuscript, IF can only be considered as a semi-quantitative assessment burdened with several technical limitations as well as operator bias and lower sensitivity and accuracy compared to flow cytometry or scRNA-seq, unless performed in a more unbiased manner. To further clarify this point, firstly, using poly-lysine coated coverslips for IF staining, results in a different substrate environment compared to the Geltrex-coated plates that were used for the induction. Additionally, we noticed that cells grew overconfluent at the edges of the coverslips. This is an important point, since as we have observed in this work, seeding density is critical for the reproducibility of the protocol. It could further be postulated that a different cell substrate stiffness might also have an effect on this process. In our opinion, in this context IF should rather be used qualitatively and a combination of flow cytometry with scRNAseq should be utilized to draw quantitative conclusions such as induction efficiencies of a certain cell type. Since we also observed inconsistencies with the SCX antibodies we tested, the generation of edited human iPSC lines (such as SCX-GFP, MKX-GFP and TNMD-GFP) would be the preferred approach to further explore the efficiency of differentiation.

      Comment 11: To enhance the paper's significance, the authors should conduct functional validation experiments and proper assessment of their induced syndetome-like cells. They could perform e.g. xeno-transplantation experiments with syndetome cells into SCID-mice or injury models. They could also assess whether the in vitro induced cells could be applied for in vitro tendon/ligament formation.

      Thanks for your comment. For the purpose of this proof-of-concept in vitro study, our primary goal was to initially evaluate a stepwise tenogenic induction protocol using GMP-ready cell lines and chemically defined media. Then, we wanted to utilize the analytical power of scRNA-seq in order to characterize and optimize the protocol, thus focusing on one developmental stage that is not well understood, that of syndetome specification from sclerotome, and hypothesized that by fine-tuning the WNT pathway we would be able to generate a more homogeneous syndetome cell population. We fully agree with the reviewer that the warranted next steps should be to conduct several functional validation experiments, such as in vitro 2D/3D tendon/ligament formation and in vivo transplantation in allogeneic or xenogeneic injury models.

      Comment 12: The authors should also compare their scRNA-seq data with actual human embryo data sets, something which could be done given the recent increase in available human embryo scRNA-seq data sets.

      This is a great idea and intriguing study. Unfortunately, not all data sets are available at the moment and specifically embryonic and MSK scRNA-seq data is very scarce, although growing. We have no access to data sets from human tendon development, and thus will have to leave this comparison for future studies.

      Reviewer 3:

      Comment 1: The data outlining the differences between the differentiation outcome of the two tested iPSCs is intriguing, but the authors fail to comment on potential differences between the two iPSC lines that could result in drastically different cell outputs from the same differentiation protocol. This is a critically important point, as the majority of the SCX+ cells generated from the 007i cells using their WNTi protocol were found in the FC subpopulation that failed to form from the 83i line under the same protocol. From the analysis of only these 2 cell lines in vitro, it is difficult to assess whether this WNTi protocol can be broadly used to generate tenogenic cells.

      Thanks for your comment. This proof-of-concept study is the first investigation that compares data of an in vitro tenogenic induction protocol that has been tested into more than one cell lines. Using unsupervised clustering we identified 11 clusters, which were classified into 6 cell subpopulations. The only observed difference between the two lines was a small subset that was labeled as fibrocartilage (FC), which displayed expression of both tenogenic and chondrogenic markers. This subpopulation was observed in 007i line but not in the 83i line at the end of the SYN induction. Importantly, DEG analysis also showed that it was enriched for SCX. It has been shown that SCX+/SOX9+ progenitors are a distinct multipotent cell group, responsible for the development of SCX−/SOX9+ chondrocytes and SCX+/SOX9− tenocytes/ligamentocytes (Sugimoto 2013)16. As noted in a previous comment (Comment 2 from Reviewer 1), we might have missed SCX upregulation during the 21-day syndetome induction. This can be further supported by Fig. 5E trajectory analysis which shows that this subpopulation (FC) precedes the SYN cell subpopulation. The fact that this subpopulation was present in one line but not the other, might indicate that 83i line resulted in a more mature tendon population. Therefore, we would rather posit that in the case of 83i line, it might not be that the FC subpopulation failed to form, but rather that it was missed in our scRNAseq endpoint analysis which showed that a more homogeneous SYN population was formed (8.7 % in 007i vs. 0.26 % in 83i, Supp. Table 1 & Supp. Fig. 3B). Future studies are warranted to characterize the SYN induction timeline as it pertains to SCX expression followed up by maturation from tenogenic progenitor to tenocytes.

      Comment 2: The authors make claims to changes in protein expression but fail to quantify either fluorescence intensity or percent cell expression from their immunofluorescence analyses to substantiate these claims. These claims are not fully supported by the data as presented as it is unclear whether there is increased expression of tendon markers at the protein level or more cells surviving the protocol. Additionally, in images where 3 channels are merged, it would be helpful to show individual channels where genes are shown in similar spectra (ie. Fig 2I SCX/MKX). Furthermore, the current layout and labelling scheme of Figure 4 makes it very difficult to compare conditions between SYN and SYNWNTi protocols.

      Thanks for your comment. Protein expression at each stage was verified with immunofluorescence cytochemistry whereby cells were cultured onto poly-lysine coated coverslips, which were then fixed, stained and imaged (Fig. 2). However, prior to WNT inhibitor application, we noticed gradual cell attrition in the cultures at the end of differentiation (Fig. 1B, 2I). The images show qualitative differences with and without the WNT inhibitor. This could be attributed to the heterogeneity of the cell population at SCL stage, which was confirmed by scRNA-seq (Fig. 3A). As it has been discussed previously (Reviewer 2 comments 5 & 9), in the current paper we didn’t provide any IF quantitative analysis because of the qualitative nature of the staining technique. In future work another high-resolution imaging modality will be considered like single cell proteomics and flow cytometry or mass cytometry in order to perform a more unbiased quantitative single cell analysis across different stages and samples. Furthermore, we have added single channel images in the supplemental information.

      Comment 3: Individual data points should also be presented for all qPCR experiments (ie. Fig 4A). Biological replicate information is missing from several experiments, particularly the immunofluorescence data, and it is unclear whether the qPCR data was generated from technical or biological replicates.

      Thanks for your comment. We have added additional information regarding replicates in each figure legend. We have also changed Fig. 4A.

      (1) Glaeser JD, Bao X, Kaneda G, et al. iPSC-neural crest derived cells embedded in 3D printable bio-ink promote cranial bone defect repair. Sci Rep. Nov 4 2022;12(1):18701. https://www.ncbi.nlm.nih.gov/pubmed/36333414

      (2) Kaneda G, Chan JL, Castaneda CM, et al. iPSC-derived tenocytes seeded on microgrooved 3D printed scaffolds for Achilles tendon regeneration. J Orthop Res. Oct 2023;41(10):2205-2220. https://www.ncbi.nlm.nih.gov/pubmed/36961351

      (3) Papalamprou A, Yu V, Chen A, et al. Directing iPSC differentiation into iTenocytes using combined scleraxis overexpression and cyclic loading. J Orthop Res. Jun 2023;41(6):1148-1161. https://www.ncbi.nlm.nih.gov/pubmed/36203346

      (4) Sheyn D, Ben-David S, Tawackoli W, et al. Human iPSCs can be differentiated into notochordal cells that reduce intervertebral disc degeneration in a porcine model. Theranostics. 2019;9(25):7506-7524. https://www.ncbi.nlm.nih.gov/pubmed/31695783

      (5) Später T, Kaneda G, Chavez M, et al. Retention of Human iPSC-Derived or Primary Cells Following Xenotransplantation into Rat Immune-Privileged Sites. Bioengineering. 2023;10(9):1049. https://www.mdpi.com/2306-5354/10/9/1049

      (6) Sareen D, O'Rourke JG, Meera P, et al. Targeting RNA foci in iPSC-derived motor neurons from ALS patients with a C9ORF72 repeat expansion. Sci Transl Med. Oct 23 2013;5(208):208ra149. https://www.ncbi.nlm.nih.gov/pubmed/24154603

      (7) Tsutsumi H, Kurimoto R, Nakamichi R, et al. Generation of a tendon-like tissue from human iPS cells. J Tissue Eng. Jan-Dec 2022;13:20417314221074018. https://www.ncbi.nlm.nih.gov/pubmed/35083031

      (8) Yoshimoto Y, Uezumi A, Ikemoto-Uezumi M, et al. Tenogenic Induction From Induced Pluripotent Stem Cells Unveils the Trajectory Towards Tenocyte Differentiation. Front Cell Dev Biol. 2022;10:780038. https://www.ncbi.nlm.nih.gov/pubmed/35372337

      (9) Matsuda M, Yamanaka Y, Uemura M, et al. Recapitulating the human segmentation clock with pluripotent stem cells. Nature. Apr 2020;580(7801):124-129. https://www.ncbi.nlm.nih.gov/pubmed/32238941

      (10) Wu CL, Dicks A, Steward N, et al. Single cell transcriptomic analysis of human pluripotent stem cell chondrogenesis. Nat Commun. Jan 13 2021;12(1):362. https://www.ncbi.nlm.nih.gov/pubmed/33441552

      (11) Nakajima T, Nakahata A, Yamada N, et al. Grafting of iPS cell-derived tenocytes promotes motor function recovery after Achilles tendon rupture. Nat Commun. Aug 18 2021;12(1):5012. https://www.ncbi.nlm.nih.gov/pubmed/34408142

      (12) Loh KM, Chen A, Koh PW, et al. Mapping the Pairwise Choices Leading from Pluripotency to Human Bone, Heart, and Other Mesoderm Cell Types. Cell. Jul 14 2016;166(2):451-467. https://www.ncbi.nlm.nih.gov/pubmed/27419872

      (13) Yu V, Papalamprou A, Sheyn D. Generation of Induced Pluripotent Stem Cell-Derived iTenocytes via Combined Scleraxis Overexpression and 2D Uniaxial Tension. JoVE. 2024/03/01 2024(205):e65837. https://app.jove.com/65837

      (14) Yin Z, Hu JJ, Yang L, et al. Single-cell analysis reveals a nestin(+) tendon stem/progenitor cell population with strong tenogenic potentiality. Sci Adv. Nov 2016;2(11):e1600874. https://www.ncbi.nlm.nih.gov/pubmed/28138519

      (15) Grinstein M, Dingwall HL, O'Connor LD, Zou K, Capellini TD, Galloway JL. A distinct transition from cell growth to physiological homeostasis in the tendon. Elife. Sep 19 2019;8. https://www.ncbi.nlm.nih.gov/pubmed/31535975

      (16) Sugimoto Y, Takimoto A, Akiyama H, et al. Scx+/Sox9+ progenitors contribute to the establishment of the junction between cartilage and tendon/ligament. Development. Jun 2013;140(11):2280-2288. https://www.ncbi.nlm.nih.gov/pubmed/23615282

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors want to elucidate which are the mechanisms that regulate the immune response in physiological conditions in cortical development. To achieve this goal, authors used a wide range of mutant mice to analyse the consequences of immune activation in the formation of cortical ectopia in mice.

      Strengths:

      The authors demonstrated that Abeta monomers are anti-inflammatory and inhibit microglial activation. This is a novel result that demonstrates the physiological role of APP in cortical development.

      Weaknesses:

      -On the other hand, cortical ectopia has been already described in mouse models in which the amyloid signalling has been disrupted (Herms et al., 2004; Guenette et al., 2006), making the current study less novel.

      We agree these previous studies have implicated amyloid precursor protein in cortical ectopia. However, since these studies use whole-body knockouts, they have not implicated the functional roles of specific cell types.  Nor have they identified the specific mechanisms underlying the formation of this unique class of cortical ectopia. In contrast, our studies show that the disruption of a novel Abeta-regulated signaling pathway in microglia is the primary cause of ectopia formation in this class of ectopia mutants. This is the first time that microglia have been specifically implicated in the development of cortical ectopia. We further show that elevated MMP activity and resulting cortical basement membrane degradation is the underlying mechanism leading to ectopia formation.  This is also the first time that MMP activity and basement membrane degradation (instead of maintenance) have been implicated in cortical ectopia development. As such, our results have provided novel insights into the diverse mechanisms underlying cortical ectopia formation in developmental brain disorders.

      One of the molecules analysed is Ric8a, a GTPase activator involved in neuronal development. Authors used the conditional mutant mice Emx1-Ric8a to delete Ric8a from early progenitors and glutamatergic neurons in the pallium. Emx1-Ric8a mutant mice present cortical ectopias and authors attributed this malformation to the increase in inflammatory response due to Ric8a deletion in microglia. Several discordances do not fit this interpretation:

      - The role of Ric8a in cortical development and function has been already described in several papers, but none of them has been cited in the current manuscript (Kask et al., 2015, 2018; Ruisu et al., 2013; Tonissoo et al., 2006).

      We have included reference to the published works on ric8a in cortical development in revision.

      - Ectopia formation in the cortex has been already described in Nestin-Ric8a cKO mice (Kask et al., 2015). In the current manuscript, authors analyzed the same mutant mice (Nestin-Ric8a), but they did not detect any ectopia. Authors should discuss this discordance.

      The expression pattern of nestin-cre is known to vary dependent on factors including transgene insertion site, genetic background, and sex. Early studies show, for example, that the nestin gene promoter drives cre expression in many non-neural tissues in another transgenic line in the FVB/N genetic background (Dubois et al Genesis. 2006 Aug;44(8):355-60. doi: 10.1002/dvg.20226).  The specific nestin-cre line used in Kask et al 2015 has also been shown to be active in brain microglia and lead to increased microglia pro-inflammatory activity upon breeding to a conditional allele of a cholesterol transporter gene (Karasinska et al., Neurobiol Dis. 2013 Jun:54:445-55; Karasinska et al.,  J Neurosci. 2009 Mar 18; 29(11): 3579–3589; Takampri et al., Brain Res. 2009 May 13:1270:10-8). These factors may in part underlie the apparent discrepancy.  We have now incorporated this discussion into the revision.

      - Authors claim that microglia express Emx1, and therefore, Ric8a is deleted in microglia cells. However, the arguments for this assumption are very weak and the evidence suggests that this is not the case. This is an important point considering that authors want to emphasise the role of Ric8a in microglia activation, and therefore, additional experiments should demonstrate that Ric8a is deleted in microglia in Emx1-Ric8a mutant mice.

      We have observed altered mRNA expression of several genes in purified microglia cultured from the emx1-cre mutants (Supplemental Fig. 8), which indicates that ric8a is deleted from microglia and suggests a role of microglial ric8a deficiency in ectopia formation.  This interpretation is further strengthened by the observation that deletion of ric8a from microglia using a microglia-specific cx3cr1-cre results in similar ectopia (Fig. 2). We also have other data supporting this interpretation, including data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a gene expression in microglia cells isolated from emx1-cre mutants. These data have now been incorporated into the text and in revised Supplemental Fig. 8 (new panels c-c” & d).

      Reviewer #2 (Public Review):

      Kwon et al. used several conditional KO mice for the deletion of ric8a or app in different cell types. Some of them exhibited pial basement membrane breaches leading to neuronal ectopia in the neocortex.

      They first investigated ric8a, a Guanine Nucleotide Exchange Factor for Heterotrimeric G Proteins. They observed the above-mentioned phenotype when ric8a is deleted from microglia and neural cells (ric8a-emx1-cre or dual deletion with cre combination cx3cr1 (in microglia) and nestin (in neural cells)) but not in microglia alone or neural cells alone (whether it is in CR cells (ric8a-Wnt3a-cre), post-mitotic neurons (nex-cre or dlx5/6-cre), or in progenitors and their progeny (nestin-cre or foxg1-cre). They also show that ric8a KO mutant microglia cells stimulated in vitro by LPS exhibit an increased TNFa, IL6 and IL1b secretion compared to controls (Fig 2). They therefore injected LPS in vivo and observed the neuronal ectopia phenotype in the ric8a-cx3cr1-cre (microglial deletion) cortices at P0 (Fig 2). They suggest that ric8a KO in neuronal cells mimics immune stimulation (but we have no clue how ric8a KO in neural cells would induce immune stimulation).

      We agree we do not currently know the precise mechanisms by which mutant microglia are activated in the mutant brain.  However, this does not affect the conclusion that deficiency in the Abeta monomer-regulated APP/Ric8a pathway in microglia is the primary cause of cortical ectopia in these mutants, since we have shown that genetic disruption of this pathway in microglia alone by targeting different pathway components, using cell type specific cre, in several different approaches, all results in similar cortical ectopia phenotypes.  Regarding the source of the immunogens, there are several possibilities which we plan to investigate in future studies. For example, the clearance of apoptotic cells and associated cellular debris is an important physiological process and deficits in this process have been linked to inflammatory diseases throughout life (Doran et al., Nat Rev Immunol. 2020 Apr;20(4):254-267; Boada-Romero et al., Nat Rev Mol Cell Biol. 2020 Jul;21(7):398-414.).  In the embryonic cortex, studies have shown that large numbers of cell death take place starting as early as E12 (Blaschke et al., Development. 1996 Apr;122(4):1165-74; Blaschke et al., J Comp Neurol. 1998 Jun 22;396(1):39-50).  Studies have also shown that radial glia and neuronal progenitors play critical roles in the clearance of apoptotic cells and associated cellular debris in the brain (Lu et al., Nat Cell Biol. 2011 Jul 31;13(9):1076-83; Ginisty et al., Stem Cells. 2015 Feb;33(2):515-25; Amaya et al., J Comp Neurol. 2015 Feb 1;523(2):183-96). Moreover, Ric8a-dependent heterotrimeric G proteins have been found to specifically promote the phagocytic activity of both professional and non-professional phagocytic cells (Billings et al., Sci Signal. 2016 Feb 2;9(413):ra14; Preissler et al., Glia. 2015 Feb;63(2):206-15; Pan et al. Dev Cell. 2016 Feb 22;36(4):428-39; Flak et al. J Clin Invest. 2020 Jan 2;130(1):359-373; Zhang et al., Nat Commun. 2023 Sep 14;14(1):5706).  Thus, it is probable that the failure to promptly clear up apoptotic cells and debris by mutant radial glia may play a role in triggering mutant microglial activation in ric8a-emx1-cre mutants. We have now included these possibilities in the text of the revised manuscript. However, the precise mechanisms remain to be determined in future studies, which, however, do not affect the conclusion of the current study.

      The authors then turned their attention on APP. They observed neuronal ectopia into the marginal zone when APP is deleted in microglia (app-cxcr3-cre) + intraperitoneal LPS injection (they did not show it, but we have to assume there would not be a phenotype without the injection of LPS) (Fig 3). (The phenotype is similar but not identical to ric8a-cx3cr1-cre + LPS. They suggest that the reason is because they had to inject 3 times less LPS due to enhanced immune sensitivity in this genetic background but it is only a hypothesis). After in vitro stimulation by LPS, app mutant microglia show a reduced secretion of TNFa and IL6 but not IL1b (this is the opposite to ric8a-cx3cr1-cre microglia cells) while peritoneal macrophages in culture show increased secretion of TNFa, IL1, IL6 and IL23 (fig 3 and Suppl. Fig 9).

      We have data showing that that app-cxcr3-cre mutants without LPS injection do not show ectopia, which has now been included in the revised supplemental Fig. 9 (new panels c-d).  The reason we employ LPS injection is, in the first place, that we do not see a phenotype without the injection. We agree, and have also stated in the text, that the phenotype of the app mutants is not as severe as that of the ric8a mutant.  Besides the low LPS dosage used, we also suggest that other app family members may compensate since the ectopia in the app family gene mutants reported previously were only observed in app/aplp1/2 triple knockouts, not even in any of the double knockouts (Herms et al., 2004). We have further clarified this point in the text. These possibilities are also not mutually exclusive. Nonetheless, the results clearly show that microglia specific app mutation causes cortical ectopia upon embryonic immune stimulation. They have thus implicated a specifical role of microglial APP in cortical ectopia formation.

      The different response of ric8a and app mutant microglia to LPS results from in vitro culturing of microglia. We have shown that, when acutely isolated macrophages are used, these mutants show changes in the same direction (both increased cytokine secretion) (Fig. 4).  This demonstrates without culturing app mutant microglial lineage cells indeed behave in the same way as ric8a mutant cells.

      The microglia used for analysis in in vitro assays in this study have all been cultured for two weeks before assay. They have thus been under chronic stimulation exposed to dead cells and debris in the culture dish through this period.  Previous studies have shown that dependent on the degree of perturbation to the inflammation-regulating pathways, such exposures can differentially affect microglial cytokine expression, sometimes in an opposite direction from expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression (as is expected), trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  In several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  Indeed, APP cytoplasmic domain is known to also bind to and signalig through several other proteins including FE65, Mena, and TIP60 (Cao & Sudhof, Science 2001. 293:115-120).  It is likely that in microglia Ric8a-dependent heterotrimeric G proteins may also mediate only a subset of the signaling downstream of APP.  As such, app knockout in microglia may have more severe effects on microglial anti-inflammatory regulation than ric8a knockout.  As a result, upon chronic immune activation, app knockout may lead to a microglial phenotype similar to the trem2 null mutation phenotype as discussed above, while ric8a knockout leads to a phenotype similar to trem2+/- phenotype). This may explain the subdued TNF and IL6 secretion by cultured app (but not ric8a) mutant microglia.

      Amyloid beta (Ab) being one of the molecules binding to APP, the authors showed that Ab40 monomers (they did not test Ab40 oligomers) partially inhibit cytokines (TNFa, IL6, IL1b, MCP-1, IL23a, IL10) secretion in vitro by microglia stimulated by LPS but does not affect secretion by microglia from app-cx3cr1-cre (tested for TNFa, IL6, IL1b, IL23a, IL10) (Fig 4, Suppl fig 10) (but still does it in aplp2-cx3cr1-cre) and does not affect secretion by ric8a-cx3cr1-cre microglia (tested for TNFa and IL6 but still suppress IL1b) (Therefore here is another difference between app and ric8a KO microglia).

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and have included the data (new panel j in supplemental Fig. 10).  As mentioned above, in several systems, Ric8a-dependent heterotrimeric G proteins have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  We assume that this is likely also true in microglia and that Ric8a-dependent heterotrimeric G proteins may mediate a subset and only a subset of the signaling downstream of APP.  This may explain the difference in the effects of app and ric8a knockout mutation in abolishing the anti-inflammatory effects of Abeta monomers on IL-1b vs TNF/IL-6.  This difference also suggests that TNF/IL-6 and IL-1b secretion must be regulated by different mechanisms in microglia. Indeed, it is well established in immunology that the secretion of IL1b, but not of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found it suppressed neuronal ectopia (Fig 5, Suppl fig 11). It is not clear whether it suppresses immune stimulation from neuronal cells or immune reaction from microglia cells.

      We agree at present the pharmacological approaches we have taken are not able to distinguish these possibilities.  However, no matter which is the case, our results still implicate a role of excessive microglial activation in the formation of cortical ectopia and support the conclusion of the study.  Thus, while worthwhile of further investigation, this question does not impact the conclusion of the current study. Furthermore, as mentioned, we plan to determine the mechanisms of how ric8a mutation in neural cells induces immune activation in future studies. These results will likely enable us to more specifically address this question.

      Finally, the authors examined the activities of MMP2 and MMP9 in the developing cortex using gelatin gel zymography. The activity and protein levels of MMP9 but not MMP2 in the ric8a-emx1-cre cortex were claimed significantly increased (Fig 5, Suppl fig 12). Unfortunately, they did not show it in the app-cx3cr1-cre +LPS mouse. They make a connection between ric8a deletion and MMP9 but unfortunately do not make the connection between app deletion and MMP9, which is at the center of the pathway claimed to be important here). Then they injected BB94, a broad-spectrum inhibitor of MMPs or an inhibitor specific for MMP9 and 13. They both significantly suppress the number and the size of the ectopia in ric8a mutants (Fig5).

      For all the gelatin gel zymography analysis, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are all directly comparable. From the quantification, our results clearly show that MMP9 activity levels are increased in the mutants (we have now included whole gel images and quantification in a new supplemental Figure 13).  The similar levels of MMP2 in all lanes also provide an internal control further supporting the observation of a specific change in MMP9.  For this analysis, we focus on the ric8a-emx1-cre mutants since the app-cx3cr1-cre +LPS animals show ectopia only in only subsets of mutants and in most cases only in one of the hemispheres.  Experiments examining potential changes in MMP9 are therefore unlikely to yield meaningful results.  On the other hand, we have clearly shown that the administration of different classes of MMP inhibitors significantly eliminate ectopia in ric8a-emx1-cre mutants. This has strongly implicated a functional contribution of MMPs.

      After reading the manuscript, I still do not know how ric8a in neural cells is involved in the immune inhibition. Is it through the control of Ab monomers? In addition, the authors did not show in vivo data supporting that Ab monomers are the key players here. As the authors said, this is not the only APP interactor. Finally, I still do not know how ric8a is linked to APP in microglia in the model.

      As detailed above, there are several possibilities including potential deficits in the clearance of apoptotic cells and associated debris that may trigger microglial activation in ri8ca-emx1-cre mutants. We will investigate these possibilities in future studies.  We have now incorporated these possibilities in the revised text.  As for the role of Abeta monomers, we have indicated that we currently do not have evidence that in the developing cortex Abeta monomers play a role in inhibiting microglia.  We have also indicated in the manuscript that our conclusion is that a microglial signaling pathway that is activated by Abeta monomers in vitro regulates normal brain development in vivo, not that Abeta monomers themselves regulate brain development.  Regarding the link between Ric8a and APP, the reviewer has missed several major lines of supporting evidence. For example, we have shown that Abeta monomers activate a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-10).  This inhibition is abolished when either app or ric8a gene is deleted from microglia.  This clearly indicates that app and ric8a act in the same genetic pathway (the pathway activated by Abeta monomers) in microglia. We also show that this Abeta monomer-activated pathway also inhibits the transcription of several cytokines in microglia.  This inhibition is also abolished when either app or ric8a gene is deleted from microglia.  This reinforces the conclusion that app and ric8a act in the same pathway in microglia.  Furthermore, cell type specific deletion of app or ric8a from microglia in vivo also results in similar phenotypes of cortical ectopia. Together, these results strongly support the conclusion that app and ric8a act in the same pathway that is activated by Abeta monomers in vitro in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins bind to APP and mediate subsets of APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).         

      While several of the findings presented in this manuscript are of potential interest, there are a number of shortcomings. Here are some suggestions that could improve the manuscript and help substantiate the conclusions:

      (1) As the title suggests it, the focus is on Ab and APP functions in microglia. However, the analysis is more focused on ric8a. The connection between ric8a and APP in this study is not investigated, besides the fact that their deletion induces somewhat similar but not identical phenotypes. Showing a similar phenotype is not enough to conclude that they are working on the same pathway. The authors should find a way to make that connection between ric8a and app in the cells investigated here.

      As discussed above, the reviewer misses several major lines of evidence showing that APP and Ric8a acts in the same pathway in microglia.  Besides the similarity of the ectopia phenotypes, for example, we have shown that Abeta monomers activates a pathway in microglia that inhibits the secretion of several proinflammatory cytokines including TNF, IL-6, IL-10, and IL-23 (Figure 4 and Supplemental Figures 8-11).  These inhibitory effects are abolished when either app or ric8a gene is deleted from microglia.  This clearly indicates that app and ric8a act in the same genetic pathway, a pathway that is activated by Abeta monomers in vitro, in microglia. We also show that this Abeta monomer-activated pathway inhibits the transcription of several cytokine genes in microglia.  These effects are again abolished when either app or ric8_a gene is deleted from microglia.  This further reinforces the conclusion that _app and ric8a act in the same pathway in microglia.  Not only so we also show that the same results are true in macrophages.  Thus, these results strongly support the conclusion that app and ric8a act in the same genetic pathway in microglia. This conclusion is also consistent with published findings that Ric8a dependent heterotrimeric G proteins biochemically bind to APP and mediate subsets of APP signaling across different species (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  

      (2) This would help to show the appearance of breaches in the pial basement membrane leading to neuronal ectopia; to investigate laminin debris, cell identity, Wnt pathway for app-cxcr3-cre + LPS injection as you did for ric8a-emx1-cre.

      We have now provided further data on pial basement membrane breaches in the app-cxcr3-cre + LPS animals (new panels e-f” in supplemental Fig 9).  We have not observed any changes in cell identity or Wnt pathway activity in ric8a-emx1-cre mutants.  It is thus of limited value to examine potential changes in these areas in the app-cxcr3-cre + LPS animals.   

      (3) As a control, this would help to show that app-cxcr3-cre without the LPS injection does not display the phenotype.

      We have the data on app-cx3cr1-cre mutants without LPS injection, which show no ectopia.  We have now included the data in the revised supplemental Fig. 9 (new panels c-d).

      (4) This would help to show the activity and protein levels of MMP9 and MMP2 and perform the rescue experiments with the inhibitors in the app-cx3cr1-cre cortex +LPS.

      As discussed above, we focus analysis on the ric8a-emx1-cre mutants since app-cx3cr1-cre +LPS animals show ectopia in only a subset of mutants and in most cases only in one of the hemispheres.  Determining potential changes in MMP9 levels and effects of MMP inhibitors are therefore not likely to yield meaningful data.  On the other hand, we have shown that MMP9 levels are increased and administration of different classes of MMP inhibitors eliminate cortical ectopia in ric8a-emx1-cre mutants.  We have also shown a similar break in the basement membrane in app-cx3cr1-cre +LPS animals (new panels e-f” in supplemental Fig 9). These results together strongly implicates a role played by MMPs.

      (5) Is MMP9 secreted by microglia cells or neural cells?

      Our in situ hybridization data show MMP9 is most highly expressed in a sparse microglia-like cell population in the embryonic cortex, suggesting that microglia may be a major source of MMP9. We have incorporated these data in a new supplemental Fig. 12 (panel a). The precise identity of these cells, however, requires further validation.

      (6) The in vitro evidence indicates that one of the multiple APP interactors, ie Ab40 monomers, is less effective in suppressing the expression of some cytokines by microglia cells mutants for ric8a (TNFa and IL6 but still suppress IL1b) or APP (TNFa, IL6, IL1b, IL23a, IL10) when compared to WT. But there are other interactors for APP. In order to support the claim, it seems crucial to have in vivo data to show that Ab40 monomers are the molecules involved in preventing the breach in the pial basement membrane.

      As addressed in detail above, we have indicated that our conclusion is that a microglial signaling pathway that is activated by Abeta monomers in vitro regulates normal brain development in vivo, not that Abeta monomers themselves regulate brain development in vivo.  We currently do not have evidence that the Abeta monomers play a role in inhibiting microglia during cortical development.  There are candidate ligands for the pathway in the developing cortex, the functional study of which, however, is a major undertaking beyond the scope of the current study.

      (7) In order to claim that this is specific to Ab40 monomers and not oligomers, it is necessary to show that the Ab40 oligomers do not have the same effect in vitro and in vivo. Also, an assay should be done to show that your Ab preparations are pure monomers or oligomers.

      We have tested the effects of Abeta40 oligomers, which induce instead of suppressing microglial cytokine secretion, and have included these data in revision in a new panel j in supplemental Fig. 10. The protocols we use in preparing the monomers and oligomers are standard protocols employed in the field of Alzheimer’s disease research. They have been repeatedly optimized and validated over the past decades.  

      (8) Most of the cytokine secretion assays used microglia cells in culture. Two results draw my attention. Ric8a deletion increases TNFa and IL6 secretion after LPS stimulation in vitro on microglia cells while app deletion decreases their secretion. Then later, papers show that the decrease in IL1b induced by Ab on microglia cells is prevented by APP deletion but not ric8a deletion. Those two pieces of data suggest that ric8a and APP might not be in the same pathway. In addition, the phenotype from app-cxcr3-cre + LPS injection and ric8a-cxcr3-cre + LPS injection are not exactly the same. It could be due to the level of LPS as the author suggests or it might not be. More experiments are needed to prove they are in the same pathway.

      As discussed above, the reviewer misses several major lines of evidence, which strongly support the conclusion that APP and Ric8a act in the same pathway activated by Abeta monomers in microglia (see detailed discussion in point 1 above).  The differential response of TNFa/IL-6 of app and ric8a mutant microglia likely results from chronic immune stimulation during in vitro culturing, which is known to alter microglial cytokine response (see detailed discussion in point 9 below). We have demonstrated that this is indeed the case by showing that, without culturing, acutely isolated app and ric8a mutant macrophages both display elevated TNFa/IL-6 secretion (Figure 4). 

      Regarding the different regulation of TNF/IL-6 vs IL-1b by APP and Ric8a, as discussed above, in several systems, Ric8a-dependent heterotrimeric G proteins (which are degraded in ric8a mutant cortices, see new supplemental Fig. 9) have been shown to act downstream of APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9).  This is likely also the case in microglia and Ric8a-dependent heterotrimeric G proteins may mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, app, mutation may abolish all the inhibitory effects of Abeta monomers (both those on TNF/IL-6 and those on IL-1b), but ric8a mutation may abolish only a subset only those on TNF/IL-6 but not those on IL-1b).  This also suggests that the secretion of TNF/IL-6 and IL-1b must be regulated by different mechanisms in microglia.  Indeed, it is well established in immunology that the secretion of IL1b, but not that of TNF or IL6, is regulated by inflammasome-dependent mechanisms (see, for example, Proz & Dixit. Nat Rev Immunol. 2016 Jul;16(7):407-20. doi: 10.1038/nri.2016.58).

      (9) How do the authors reconcile the reduced TNFa and IL6 secretion upon stimulation of app mutant microglia with the model where app is attenuating immune response in vivo? Line 213 says that microglia exhibit attenuated immune response following chronic stimulation but I don't know if 3 hours of LPS in vitro is a chronic stimulation.

      The reviewer has misunderstood.  The microglia used in this study have all been cultured in vitro for approximately two weeks before assay. They have thus been under chronic stimulation exposed to dead cells and debris in the culture dish.  Dependent on the degree of perturbation to the inflammation-regulating pathways, such exposures are known to change microglial cytokine expression, sometimes in an opposite direction than expected.  For example, under chronic immune stimulation, while the trem2+/- microglia, which are heterozygous mutant for the anti-inflammatory Trem2, show elevated pro-inflammatory cytokine expression, trem2-/- (null) microglia under the same conditions instead not only do not show increases but for some pro-inflammatory cytokines, actually show decreases in expression (Sayed et al.,, Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10172-10177).  As mentioned, in several systems, Ric8a-dependent heterotrimeric G proteins have also been shown to bind to APP and mediate one of the branches of the signaling activated by APP (Milosch et al., Cell Death Dis. 2014 Aug 28;5(8):e1391; Fogel et al,, Cell Rep. 2014 Jun 12;7(5):1560-1576; Ramaker et al., J Neurosci. 2013 Jun 12;33(24):10165-81; Nishimoto et al., Nature. 1993 Mar 4;362(6415):75-9). Thus, it is likely that in microglia, Ric8a-dependent heterotrimeric G proteins also mediate only a subset of the anti-inflammatory signaling activated by APP.  As such, app knockout in microglia may have more severe effects than ric8a knockout on microglial immune activation, resembling the relationship between trem2 null vs heterozygous mutation discussed above. As such, it is predicted that chronic immune stimulation such as in vitro culturing will result in attenuated pro-inflammatory cytokine expression in app mutant microglia but elevated cytokine expression in ric8a mutant microglia. This may explain why TNF and IL6 secretion by cultured app mutant microglia is subdued, but acutely isolated _a_pp mutant macrophages instead show increased cytokine secretion. The latter may be more representative of the response of app mutant microglia in the absence of chronic stimulation.

      (10) Line 119: In their model, the authors suggest that there is a breach in pial basement membrane but that the phenotype is different from the retraction of the radial fibers due to reduced adhesion. So, could the author discuss to what substrate the radial fibers are attached to, in their model where the pial surface is destroyed?

      Radial glial endfeet normally bind to the basement membrane via cell surface receptors including the integrin and the dystroglycan protein complexes. We observe free radial glial endfeet at the breach sites, apparently without attachment to any basement membrane.  However, we cannot exclude the possibility that there may be residual, broken-off basement membrane components bound to the endfeet that are not detected by the methodology employed. 

      (11) The authors should show that the increased cytokine secretion observed in vitro is also happening in vivo in ric8a-emx1-cre compared to WT mice and compared to ric8a-nestin-cre mice. Or when app is deleted in microglia (app-cxcr3-cre) + LPS injection compared to WT mice +LPS.

      Unfortunately, this is not technically feasible since it is not possible to extract the extracellular (secreted) fractions of cytokines from an embryonic brain without causing cell lysis and the release of the intracellular pool.  This, however, does not affect our conclusion that the Abeta monomer-regulated microglia pathway plays a key role in regulates normal brain development since its genetic disruption, by different approaches, clearly results in brain malformation.

      (12) The authors injected inhibitors of Akt or Stat3 in the ric8a-emx1-cre cortex and found that it suppressed neuronal ectopia (Fig 5, Suppl fig 11). Does it suppress immune stimulation from neuronal cells or immune reaction from microglia cells?

      As discussed above, we agree at present the pharmacological approaches we have taken are not able to distinguish these two possibilities.  However, whichever is true, it does not affect our conclusion.  Also, we plan to determine the mechanisms of how ric8a mutation in neural cells induce immune activation in future studies. These results will likely enable us to adopt specific approaches to address this question.

      (13) Fig 5 and Supplementary fig 12: Please show a tubulin loading control in Fig 5i as you did in suppl fig 12 d (gel zymography). Please provide a gel zymography showing side by side Control, mutant and mutant +DM/S3I treatment. The same request for the MMP9 staining. Please provide statistics for control vs mutant for suppl fig 12c and d..

      We have now included whole gel zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13 (panels b-c). This clearly shows increases in MMP9, while the MMP2 levels appear similar between controls and mutants. For all of the experiments of gelatin gel zymography, we quantify protein concentrations in the cortical lysates using the Bio-Rad Bradford assay kit and load the same amounts of proteins per lane. The results across lanes are thus all comparable.  The MMP9 staining images for the controls and mutants have also all been taken with the same parameters on the microscope and can be directly compared.  The statistics have now been provided as suggested.

      (14) Please provide the name and the source of the MMP9/13 inhibitor used in this study.

      This inhibitor is MMP-9/MMP-13 inhibitor I (CAS 204140-01-2), from Santa Cruz Biotechnology. This information has been included in revision.

      (15) The results show that deletion of ric8a in microglia and neural cells induced pia membrane breaches but no phenotype is apparent in ric8a deletion in microglia or neural cells alone. Then, the results showed that intraperitoneal injection of LPS induced the phenotype in ric8a-cxcr3-cre mutants. It would be beneficial as a control supporting the model to show that the insult induced by LPS injection does not induce the phenotype in the ric8a-foxg1-cre mice.

      We agree it may potentially be useful to show that LPS injection does not induce ectopia in ric8a-foxg1-cre mice.  Unfortunately, since the ric8a-foxg1-cre mutation shows no phenotype, we are no longer in possession of this line.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - The information in the abstract and the introduction is only related to app. So, it is very abrupt how authors start the manuscript studying the role of Ric8a, with no information at all about this protein and why the authors want to investigate this role in microglial activation. Later in the manuscript, the authors tried to link Ric8a with app to study the role of app in the inflammatory response and ectopia formation. This link is quite weak as well.

      In the last paragraph of the Introduction, we explain the use of the ric8a mutant and how it leads to discovery of the Abeta monomer-regulated pathway. We have now improved the writing in revision to make these points especially the link between APP and Ric8a-regulated G proteins more clear.  In the Results section, we have also improved the writing on the potential link of Ric8a to APP by highlighting, among others, the fact that ric8a and app pathway mutants are among a unique group of a few mouse mutants (ric8a, app/aplp1/2, and apbb1/2) that show cortical ectopia exclusively in the lateral cortex, while all other cortical ectopia mutants also show severe ectopia are at the cortical midline.  This suggests that similar mechanisms may underlie the ectopia formation in this small group of mutants.

      -In order to validate the mouse model, double immunofluorescence or immunofluorescence+in situ hybridization should be performed to show that microglia express ric8a and that is eliminated in the Emx1-Ric8a mutant mice.

      As mentioned above, we have additional lines of evidence showing that ric8a is deleted from microglia in emx1-cre mutants. This includes data showing induction of the expression of a cre reporter in brain microglia by emx1-cre and loss of ric8a mRNA expression in microglia cells isolated from emx1-cre mutants.  These data have now been included in revised supplemental Fig. 8.

      -In Supplemental Fig. 6, the authors claimed that cell proliferation is normal in Ric8a mutant mice without doing any quantification. They also quantified the angle of mitotic division of progenitors in the ventricular zone, but there are no images for the spindle orientation quantification, and no description of how they did it. In addition, this data is contrary to what has already been published in conditional Ric8a mutant mice (Kask et al., 2015). The Vimentin staining should be improved.

      We have provided quantification of cell proliferation (phospho-histone 3 staining at the ventricular surface) in revised supplemental Fig. 6g, which shows no significant differences in the number of positive cells. We have also provided details on the definition of the angle of cleavage plane orientation in revised supplemental Fig. 6h and in the Methods section.  We are not sure why the results are different from the other study. We were indeed anticipating deficits in mitotic spindle orientation and spent major efforts in the analysis of this potential deficit.  However, based on the data, we could not draw the conclusion.     

      -Analysis of the MMP9 expression should be done by western blot and not by immunofluorescence. In fact, the MMP9 expression shown in Figure 5g,h, does not correspond with RNA expression shown in gene expression atlas like genepaint or the allen atlas, doubting the specificity of the antibody. The expression of Mmp9 is quite low or absent in the cortex at E13.5-E14.5, making this protein very unlikely to be responsible for laminin degradation during development.

      We have performed gelatin gel zymography on MMP2/9, which shows increased MMP9 activity levels in the mutant cortex. This is similar to Western blot analysis (all lanes are loaded with the same amounts of cortical lysates).  We have now included whole gel zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13 (panels b-c).  The immunofluorescence staining of MMP9, a different type of analysis, was designed as a complementary approach, the results of which also support the interpretation of increases in MMP9 protein.  Regarding MMP9 RNA expression, please also note that MMP9 is secreted, and the protein expression pattern is expected to be different from that of RNA. We have performed wholemount in situ using dissected E13.5 mouse forebrains.  Our data (in new supplemental Fig.13a) show that MMP9 mRNA is strongly expressed in a sparse population of cells many of which appear to align along blood vessels. We suspect these are microglial lineage cells populating the embryonic cortex at this stage (see, for example, Squarzoni et al., Cell Rep. 2014 Sep 11;8(5):1271-9. doi: 10.1016/j.celrep.2014.07.042.).  Our control in situ using a Tnc5 probe also shows that the MMP9 signal is not a result of nonspecific probe binding.  Since the MMP9 expressing cells are very sparse even in the wholemount specimens while most database RNA in situ expression data are obtained using thin sections, we suspect this may be why the signal may have been missed in the databases.  As for functional contributions, we agree that we cannot rule roles played by other MMPs.  However, based on the ectopia suppression data, our results clearly indicate a critical contribution by MMP9/13.

      For MMP9 activity, authors should show the whole membrane with a minimum of three control and three mutant individual samples and with the quantification.<br /> - The graphs should be improved, including individual values and titles of the Y axes.

      We have included whole membrane zymography images with four control and four mutant individual samples as well as quantification in a new supplemental Fig.13b-c.  The graphs have also been improved as suggested.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We are grateful to the reviewers for their positive assessment of the revised version of the article.

      Please find below our answers to the last, minor comments of the reviewers.

      We thank the reviewer for this important comment. In our live imaging experiments, we actually tracked the dorsal and ventral borders of the omp:yfp positive clusters in control and sly mutant embryos. These measurements showed that the omp:yfp positive clusters are more elongated along the DV axis in mutants as compared with control siblings, as seen on fixed samples (data not shown), suggesting that this difference in tissue shape is not due to fixation.

      Reviewer #4 (Public review):

      Summary:

      In this elegant study XX and colleagues use a combination of fixed tissue analyses and live imaging to characterise the role of Laminin in olfactory placode development and neuronal pathfinding in the zebrafish embryo. They describe Laminin dynamics in the developing olfactory placode and adjacent brain structures and identify potential roles for Laminin in facilitating neuronal pathfinding from the olfactory placode to the brain. To test whether Laminin is required for olfactory placode neuronal pathfinding they analyse olfactory system development in a well-established laminin-gamma-1 mutant, in which the laminin-rich basement membrane is disrupted. They show that while the OP still coalesces in the absence of Laminin, Laminin is required to contain OP cells during forebrain flexure during development and maintain separation of the OP and adjacent brain region. They further demonstrate that Laminin is required for growth of OP neurons from the OP-brain interface towards the olfactory bulb. The authors also present data describing that while the Laminin mutant has partial defects in neural crest cell migration towards the developing OP, these NCC defects are unlikely to be the cause of the neuronal pathfinding defects upon loss of Laminin. Altogether the study is extremely well carried out, with careful analysis of high-quality data. Their findings are likely to be of interest to those working on olfactory system development, or with an interest in extracellular matrix in organ morphogenesis, cell migration, and axonal pathfinding.

      Strengths:

      The authors describe for the first time Laminin dynamics during the early development of the olfactory placode and olfactory axon extension. They use an appropriate model to perturb the system (lamc1 zebrafish mutant), and demonstrate novel requirements for Laminin in pathfinding of OP neurons towards the olfactory bulb.

      The study utilises careful and impressive live imaging to draw most of its conclusions, really drawing upon the strengths of the zebrafish model to investigate the role of laminin in OP pathfinding. This imaging is combined with deep learning methodology to characterise and describe phenotypes in their Laminin-perturbed models, along with detailed quantifications of cell behaviours, together providing a relatively complete picture of the impact of loss of Laminin on OP development.

      Weaknesses:

      Some of the statistical tests are performed on experiments where n=2 for each condition (for example the measurements in Figure S2) - in places the data is non-significant, but clear trends are observed, and one wonders whether some experiments are under-powered.

      We initially planned the electron microscopy experiments in order to analyse 3 embryos per genotype per stage. However, because of technical issues we could not perform the measurements in all the cases, explaining why we have n = 2 in some of the graphs. The trends were quite clear, so we chose to keep these data in the article. We believe they nicely complement the immunostaining data assessing basement membrane integrity in control and mutant embryos.


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors describe the dynamic distribution of laminin in the olfactory system and forebrain. Using immunohistochemistry and transgenic lines, they found that the olfactory system and adjacent brain tissues are enveloped by BMs from the earliest stages of olfactory system assembly. They also found that laminin deposits follow the axonal trajectory of axons. They performed a functional analysis of the sly mutant to analyse the function of laminin γ1 in the development of the zebrafish olfactory system. Their study revealed that laminin enables the shape and position of placodes to be maintained late in the face of major morphogenetic movements in the brain, and its absence promotes the local entry of sensory axons into the brain and their navigation towards the olfactory bulb. 

      Strengths: 

      - They showed that in the sly mutants, no BM staining of laminin and Nidogen could be detected around the OP and the brain. The authors then elegantly used electron microscopy to analyse the ultrastructure of the border between the OP and the brain in control and sly mutant conditions. 

      - To analyse the role of laminin γ1-dependent BMs in OP coalescence, the authors used the cluster size of Tg(neurog1:GFP)+ OP cells at 22 hpf as a marker. They found that the mediolateral dimension increased specifically in the mutants. However, proliferation did not seem to be affected, although apoptosis appeared to increase slightly at a later stage. This increase could therefore be due to a dispersal of cells in the OP. To test this hypothesis, the authors then analysed the cell trajectories and extracted 3D mean square displacements (MSD), a measure of the volume explored by a cell in a given period of time. Their conclusion indicates that although brain cell movements are increased in the absence of BM during coalescence phases, overall OP cell movements occur within normal parameters and allow OPs to condense into compact neuronal clusters in sly mutants. The authors also analysed the dimensions of the clusters composed of OMP+ neurons. Their results show an increase in cluster size along the dorso-ventral axis. These results were to be expected since, compared with BM, early neurog1+ neurons should compact along the medio-lateral axis, and those that are OMP+ essentially along the dorso-ventral axis. In addition to the DV elongation of OP tissue, the authors show the existence of isolated and ectopic (misplaced) YFP+ cells in sly mutants. 

      - To understand the origin of these phenotypes, the authors analysed the dynamic behaviour of brain cells and OPs during forebrain flexion. The authors then quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, and proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected. 

      - They then analysed the dynamic behaviour of the axon using live imaging. Thus, olfactory axon migration is drastically impaired in sly mutants, demonstrating that Laminin γ1dependent BMs are essential for the growth and navigation of axons from the OP to the olfactory bulb. 

      - The authors therefore performed a quantitative analysis of the loss of function of Laminin γ1. They propose that the BM of the OP prevents its deformation in response to mechanical forces generated by morphogenetic movements of the neighbouring brain. 

      Weaknesses: 

      - The authors did not analyse neurog1 + axonal migration at the level of the single cell and instead made a global analysis. An analysis at the cell level would strengthen their hypotheses.  

      - Rescue experiments by locally inducing Laminin expression would have strengthened the paper. 

      - The paper lacks clarity between the two neuronal populations described (early EONs and late OSNs).  

      - The authors quantitatively measured brain versus OPs in the sly mutant and found that the OP-brain boundary was poorly defined in the sly mutant compared with the control. Once again, the methods (cell tracks, brain size, proliferation/apoptosis, and the shape of the brain/OP boundary) are elegant but the results were expected. 

      - A missing point in the paper is the effect of Laminin γ1 on the migration of cranial NCCs that interact with OP cells. The authors could have analysed the dynamic distribution of neural crest cells in the sly mutant. 

      We thank the reviewer for the overall positive assessment of our work, and we carefully responded to all her/his insightful comments below. Live imaging experiments to (1) visualise exit and entry point formation with only a few axons labelled, (2) characterise the behaviour of single neurog1:GFP-positive neurons/axons during OP coalescence and to (3) analyse the migration of cranial NCC are now included in the revised manuscript to address the reviewer’s questions, and reinforce our initial conclusions.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript addresses the role of the extracellular matrix in olfactory development. Despite the importance of these extracellular structures, the specific roles and activities of matrix molecules are still poorly understood. Here, the authors combine live imaging and genetics to examine the role of laminin gamma 1 in multiple steps of olfactory development. The work comprises a descriptive but carefully executed, quantitative assessment of the olfactory phenotypes resulting from loss of laminin gamma. Overall, this is a constructive advance in our understanding of extracellular matrix contributions to olfactory development, with a well-written Discussion with relevance to many other systems. 

      Strengths: 

      The strengths of the manuscript are in the approaches: the authors have combined live imaging, careful quantitative analyses, and molecular genetics. The work presented takes advantage of many zebrafish tools including mutants and transgenics to directly visualize the laminin extracellular matrix in living embryos during the developmental process. 

      Weaknesses: 

      The weaknesses are primarily in the presentation of some of the imaging data. In certain cases, it was not straightforward to evaluate the authors' interpretations and conclusions based on the single confocal sections included in the manuscript. For example, it was difficult to assess the authors' interpretation of when and how laminin openings arise around the olfactory placode and brain during olfactory axon guidance. 

      We thank the reviewer for the overall positive assessment of our work, and we carefully responded to all her/his insightful comments below. To address these comments, live imaging data to visualise exit and entry point formation with a sparse labelling of axons, and z-stacks showing how exit and entry points are organised in 3D, have been added to the revised manuscript.

      Reviewer #3 (Public Review): 

      This is a beautifully presented paper combining live imaging and analysis of mutant phenotypes to elucidate the role of laminin γ1-dependent basement membranes in the development of the zebrafish olfactory placode. The work is clearly illustrated and carefully quantified throughout. There are some very interesting observations based on the analysis of wild-type, laminin γ1, and foxd3 mutant embryos. The authors demonstrate the importance of a Laminin γ1-dependent basement membrane in olfactory placode morphogenesis, and in establishing and maintaining both boundaries and neuronal connections between the brain and the olfactory system. There are some very interesting observations, including the identification of different mechanisms for axons to cross basement membranes, either by taking advantage of incompletely formed membranes at early stages, or by actively perforating the membrane at later ones. 

      This is a valuable and important study but remains quite descriptive. In some cases, hypotheses for mechanisms are stated but are not tested further. For example, the authors propose that olfactory axons must actively disrupt a basement membrane to enter the brain and suggest alternative putative mechanisms for this, but these are not tested experimentally. In addition, the authors propose that the basement membrane of the olfactory placode acts to resist mechanical forces generated by the morphogenetic movement of the developing brain, and thus to prevent passive deformation of the placode, but this is not tested anywhere, for example by preventing or altering the brain movements in the laminin γ1 mutant. 

      We thank the reviewer for the overall positive assessment of our work and for suggesting interesting experiments to attempt in the future, and we carefully responded to all her/his constructive comments below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In general, it would be easier to draw conclusions and compare data if the authors used similar stages throughout the article. 

      Throughout the article we tried to focus on a series of stages that cover both the coalescence of the OP (up to 24 hpf) and later stages of olfactory system development spanning the brain flexure process (28, 32, 36 hpf). However, for technical reasons it was not always possible to stick to these precise stages in some of our experiments. Also, in Fig. 1E-J, we picked in the movies some images illustrating specific cell or axonal behaviours, and thus the corresponding stages could not match exactly the stage series used in Fig. 1A-D and elsewhere in the article. Nevertheless, this stage heterogeneity does not affect our main conclusions.

      It would be useful to schematise the olfactory placode and the brain in an insert to clearly visualise the system in each figure. 

      We hope that the schematic which was initially presented in Fig. 1K already helps the reader to understand how the system is organised. Although we have not added more schematic views to represent the system in each figure (we think this would make the figures overcrowded), we have added additional legends to point to the OP and the brain in the pictures in order to clarify the localisation of each tissue.

      In the Summary, the authors refer to the integrity of the basement membrane. I don't think there is any attempt to affect basement membrane integrity in the article. It would be important to do so to look at the effect on CNS-PNS separation and axonal elongation. 

      In the Summary, we use the term « integrity of the basement membrane » to mention that we have analysed this integrity in the sly mutant. Given the results of our immunostainings against three main components of the basement membrane (Laminin, Collagen IV and Nidogen), as well as our EM observations, we see the sly mutant as a condition in which the integrity of the basement membrane is strongly affected.

      Rescue experiments by locally inducing Laminin expression would have strengthened the paper. 

      We have attempted to rescue the sly mutant phenotypes by introducing the mutation in the transgenic TgBAC(lamC1:lamC1-sfGFP) background, in which Laminin γ1 tagged with sfGFP is expressed under the control of its own regulatory sequences (Yamaguchi et al., 2022). To do so, we crossed sly+/-;Tg(omp:yfp) fish with sly+/-; Tg(lamC1:LamC1-sfGFP) fish. Surprisingly, while a rescue of the global embryo morphology was observed, no clear rescue of the olfactory system defects could be detected at 36 hpf. This could be due to the fact that the expression level of LamC1-sfGFP obtained with one copy of the transgene is not sufficient to rescue the olfactory system phenotypes, or that the sfGFP tag specifically affects the function of the Laminin 𝛾1 chain during the development of the olfactory system, making it unable to rescue the defects. Given the results of our first attemps, we decided not to continue in this direction.

      (1) Developing OP & brain are surrounded by laminin-containing BM (already described by Torrez-Pas & Whitlock in 2014). 

      "we first noticed the appearance of a continuous Laminin-rich BM surrounding the brain from 14-18 hpf, while around the OP, only discrete Laminin spots were detected at this stage (Fig. 1A, A'). " 

      Around 8ss for Torrez-Pas & Whitlock (before 14 hpf). Can you modify the text, or show an 8ss stage embryo? As far as I know, the authors do not show images at 14hpf. Please correct this sentence or show a 14 hpf picture. 

      The reviewer is right, we do not show any 14 hpf stage in the images and thus have removed this stage in the text and replaced it by 17 hpf.

      In Figure 1A, the labelling of laminin 111 does not appear to be homogeneous along the brain.

      Is this true? 

      At this stage the brain’s BM revealed by the Laminin immunostaining appears fairly continuous (while the OP’s one is clearly dotty and less defined), but indeed very tiny/local interruptions of the signal can been seen along the structure as detected by the reviewer. We thus modified the text to mention these tiny interruptions.

      How is the Laminin antibody used by the authors specific to laminin 111?  

      We thank the reviewer for raising this important point. The immunogen used to produce this rabbit polyclonal antibody is the Laminin protein isolated from the basement membrane of a mouse Engelbreth Holm-Swarm sarcoma (EHS). It is thus likely to recognise several Laminin isoforms and not only Laminin 111. We thus replaced Laminin 111 by Laminin when mentioning this antibody in the text and Figures.

      Please schematise in Figure 1K the stages you have tested and shown here in the article i.e. stages 18 - 22 - 28 -36 hpf using immunohistochemistry and 17-26-27-29-33 and 38 hpf using transgenics for laminin 111 and LamC1 respectively.  

      As suggested by the reviewer, we changed the stages in the schematics for stages we have presented in Figure 1 (analysed either with immunostaining or in live imaging experiments). We chose to represent 17 - 22 - 26 - 33 hpf (and thus adapted some of the schematics for them to match these stages).  

      Please specify in the Figure 1 legend for panels A to D whether this is a 3D projection or a zsection.

      We indicated in the Figure 1 legend that all these images are single z-sections (as well as for panels E-J).

      Furthermore, the schematisation in Fig. 1K does not reflect what the authors show: at 22 hpf laminin 111 labelling appears to be present only near the brain, and no labelling lateral to the olfactory placode and anteriorly and posteriorly. Thus, the schematisation in Figure 1K needs to be modified to reflect what the authors show.

      We agree with the reviewer that the Laminin staining at this stage is observed around the medial region of the OP, but not more laterally. We modified the schematic view accordingly in Figure 1K. Anterior and posterior sides of the OP are not represented in this schematic because we chose to represent a frontal view rather than a dorsal view.

      The authors suggest that" the laminin-rich BM of OP assembles between 18 and 22 hpf, during the late phase of OP coalescence". However, their data indicate that this BM assembles around 28hpf (Figure 1C). Can they clarify this point?

      What we meant with this sentence is that we cleary see two distinct BMs from 22 hpf. However, as noticed by the reviewer, the OP’s BM is only present around the medial/basal regions of the OP and does not surround the whole OP tissue at this stage. We modified the text to clarify this point (in particular by mentioning that the OP’s BM starts to assemble between 18 and 22 hpf), and replaced the image shown in Figure 1B, B’ with a more representative picture (the previous z-section was taken in very dorsal regions of the OP).

      It would be useful to disrupt these cells that have a cytoplasmic expression of Laminin-sfGFP, to analyse their contribution to BM and OP coalescence.

      Indeed it will be interesting in the future to test specifically the role of the cells expressing cytoplasmic Laminin-sfGFP around and within the OP, as proposed by the reviewer. Laser ablation of these cells could be attempted, but due to their very superficial localisation, close to the skin, we believe these ablations (with the protocol/set-up we currently use in the lab) would impair the skin integrity, preventing us to conclude. We consider that the optimisation of this experiment is out of the scope of the present work.

      Tg(-2.0ompb:gapYFP)rw032 marks ciliated olfactory sensory neurons (OSNs) (Sato et al., 2005). The authors should mention this. 

      Please see our detailed response to the next point below.

      Points to be clarified: 

      -Tg(-2.0ompb:gapYFP)rw032 marks ciliated olfactory sensory neurons (OSNs) (Sato et al., 2005). The authors should mention this here. Moreover, the authors refer to "OP neurons" throughout the article. In the development of the olfactory organ, two types of neurons have been described in the literature: early EONs (12hpf-26hpf) and later OSNs. Each could have a specific role in the establishment and maintenance of the BM described by the authors. The authors need to clarify this point as, in Figure 1 for example, they use a marker for Tg(neurog1:GFP) EONs and a marker for ciliated OSNs without distinction. The distinction between EONs and OSNs comes a little late in the text and should be placed higher up. 

      As mentioned by the reviewer, according to the initial view of neurogenesis in the OP, OP neurons are born in two waves. A transient population of unipolar, dendrite-less pioneer neurons would differentiate first, in the ventro-medial region of the OP and elongate their axons dorsally out of the placode, along the brain wall. These pioneer axons would then be used as a scaffold by later born OSNs located in the dorso-lateral rosette to outgrow their axons towards the olfactory bulb (Whitlock and Westerfield, 1998). 

      Another study further characterised OP neurogenesis and showed that the first neurons to differentiate in the OP (the early olfactory neurons or EONs) express the Tg(neurog1:GFP) transgene (Madelaine et al., 2011). As mentioned by the authors in the discussion of this article, neurog1:GFP+ neurons appear much more numerous than the previously described pioneer neurons, and may thus include pioneers but also other neuronal subtypes.

      We would like here to share additional, unpublished observations from our lab that further suggest that the situation is more complex than the pioneer/OSN and EON/OSN nomenclatures. First, in many of our live imaging experiments, we can clearly visualise some neurog1:GFP+ unipolar neurons, initially located in a medial position in the OP, which intercalate and contribute to the dorsolateral rosette (where OSNs are proposed to be located) at the end of OP coalescence, from 22-24 hpf. Second, in fixed tissues, we observed that most neurog1:GFP+ neurons located in the rosette at 32 hpf co-express the Tg(omp:meRFP) transgene (Sato et al., 2005). These observations suggest that at least a subpopulation of neurog1:GFP+ neurons could incorporate in the dorsolateral rosette and become ciliated OSNs during development. We can share these results with the reviewer upon request. Further studies are thus needed to clarify and describe the neuronal subpopulations and lineage relationships in the OP, but this detailed investigation is out of the scope and focus of the present study. 

      An additional complication comes from the fact that, as shown and acknowledged by the authors in Miyasaka et al., 2005, the Tg(omp:meYFP) line (6kb promoter) labels ciliated OSNs in the rosette but also some unipolar, ventral neurons (around 10 neurons at 1 dpf, Miyasaka et al. 2005, Figure 3A, white arrowheads). This was also observed using the 2 kb promoter Tg(omp:meYFP) line (see for instance Miyasaka et al., 2007) and in our study, we can indeed detect these ventro-medial neurons labelled in the Tg(omp:meYFP) line (2 kb promoter), see for instance Figure 1C’, D’ or Movie 6. It is unclear whether these unipolar omp:meYFPpositive cells are pioneer neurons or EONs expressing the omp:meYFP transgene, or OSN progenitors that would be located basally/ventrally in the OP at these stages.

      For all these reasons, we decided to present in the text the current view of neurogenesis in the OP but instead of attributing a definitive identity to the neurons we visualise with the transgenic lines, we prefer to mention them in the manuscript (and in the rest of the response to the reviewers) as neurons expressing neurog1:GFP or omp:meYFP transgenes (or cells/axons/neurons expressing RFP in the Tg(cldnb:Gal4; UAS:RFP) background).

      What we also changed in the text to be more clear on this point:

      - we moved higher up in the text, as suggested by reviewer 1, the description of the current model of neurogenesis in the OP,

      - we mentioned that neurog1:GFP+ neurons are more numerous than the initially described pioneer neurons, as discussed in Madelaine et al., 2011,

      - we wrote more clearly that the Tg(omp:meYFP) line labels ciliated OSNs but also a subset of unipolar, ventral neurons (Miyasaka et al., 2005), and pointed to these ventral neurons in Figure 1C’, D’,

      - in the initial presentation of the current view of OP neurogenesis we renamed neurog1:GFP+ into EONs to be coherent with Madelaine et al., 2011.

      - To visualise pioneer axons, the authors should use an EONS marker such as neurog1 because, to my knowledge, OMP only marks OSN axons and not pioneer axons.  

      To visualise neurog1:GFP+ axons during OP coalescence, we performed live imaging upon injection of the neurog1:GFP plasmid (Blader et al., 2003) in the Tg(cldnb:Gal4; UAS:RFP) background (n = 4 mutants and n = 4 controls from 2 independent experiments). We observed some GFP+ placodal neurons exhibiting retrograde axon extension in both controls and sly mutants. In such experiments it is very difficult to quantify and compare the number of neurons/axons showing specific behaviours between different experimental conditions/genetic background. Indeed, due to the cytoplasmic localisation of GFP, the axons can only be seen in neurons expressing high levels of GFP, and due to the injection the number of such neurons varies a lot in between embryos, even in a given condition. Nevertheless, our qualitative observations reinforce the idea that the basement membrane is not absolutely required for mediolateral movements and retrograde axon extension of neurog1:GFP+ neurons in the OP. We added examples of images extracted from these new live imaging experiments in the revised Fig. S5A, B.

      - The authors should analyse the presence of laminin in the OP and forebrain in conjunction with neural crest cell dynamics (using a Sox10 transgenic line for example) to refine their entry and exit point hypotheses. 

      As described in the answer to the next point, we performed new experiments in which we visualised NCC migration in the Tg(neurog1:GFP) background, which allowed us to analyse the localisation of NCC at the forebrain/OP boundary, in ventral and dorsal positions, both in sly mutant embryos and control siblings.

      - A dynamic analysis of the distribution of neural crest cells in the sly mutant over time and during OP coalescence would be important. 

      The dynamics of zebrafish cranial NCC migration in the vicinity of the OP has been previously analysed using sox10 reporter lines (Harden et al., 2012, Torres-Paz and Whitlock, 2014, Bryan et al., 2020). To address the point raised by the reviewer, we performed live imaging from 16 to 32 hpf on sly mutants and control siblings carrying the Tg(neurog1:GFP) and Tg(UAS:RFP) transgenes and injected with a sox10(7.2):KalTA4 plasmid (Almeida et al., 2015). This allows the mosaic labelling of cells that express or have expressed sox10 during their development which, in the head region at these stages, represents mostly NCC and their derivatives. 3 independent experiments were carried out (n = 4 mutant embryos in which 8 placodes could be analysed; n = 6 control siblings in which 10 placodes could be analysed). A new movie (Movie 9) has been added to the revised article to show representative examples of control and mutant embryos.

      From these new data, we could make the following observations:

      - As expected from previous studies (Harden et al., 2012, Torres-Paz and Whitlock, 2014, Bryan et al., 2020), in control embryos a lot of NCC had already migrated to reach the vicinity of the OP when the movies begin at 16 hpf, and were then seen invading mainly the interface between the eye and the OP (10/10 placodes). Surprisingly, in sly mutants, a lot of motile NCC had also reached the OP region at 16 hpf in all the analysed placodes (8/8), and populated the eye/OP interface in 7/8 placodes (10/10 in controls). Counting NCC or tracking individual NCC during the whole duration of the movies was unfortunately too difficult to achieve in these movies, because of the low level of mosaicism (a high number of cells were labelled) and of the high speed of NCC movements (as compared with the 10 min delta t we chose for the movies). 

      - in some of the control placodes we could detect a few NCC that populated the forebrain/OP interface, either ventrally, close to the exit point of the axons (4/10 placodes), or more dorsally (8/10 placodes). By contrast, in sly mutants, NCC were observed in the dorsal region of the brain/OP boundary in only 2/8 placodes, and in the ventral brain/OP frontier in only 2/8 placodes as well. Interestingly, in these 2 last samples, NCC that had initially populated the ventral region of the brain/OP interface were then expelled from the boundary at later stages.

      We reported these observations in a new Table that is presented in revised Fig. S6B. In addition, instances of NCC migrating at the eye/OP or forebain/OP interfaces are indicated with arrowheads on Movie 9. Previous Figure S6 was splitted into two parts presenting NCC defects in sly mutants (revised Figure S6) and in foxd3 mutants (revised Figure S7).

      Altogether, these new data suggest that the first postero-anterior phase of NCC migration towards the OP, as well as their migration in between the eye and OP tissues, is not fully perturbed in sly mutants. The subset of NCC that populate the OP/forebrain seem to be more specifically affected, as these NCC show defects in their migration to the interface or the maintenance of their position at the interface. Since the crestin marker labels mostly NCC at the OP/forebrain interface at 32 hpf (revised Fig. S6A), this could explain why the crestin ISH signal is almost lost in sly mutants at this stage.

      (2) Laminin distribution suggests a role in olfactory axon development 

      "Laminin 111 immunostaining revealed local disruptions in the membrane enveloping the OP and brain, precisely where YFP+ axons exit the OP (exit point) and enter the brain (entry point) (Fig. 1C-D')." Can the authors quantify this situation? It would be important to analyse this behaviour on the scale of a neuron and thus axonal migration to strengthen the hypotheses. 

      As suggested by the reviewer, to better visualise individual axons at the exit and entry point, we used mosaic red labelling of OP axons. To achieve this sparse labelling, we took advantage of the mosaic expression of a red fluorescent membrane protein observed in the Tg(cldnb:Gal4; UAS:lyn-TagRFP) background. The unpublished Tg(UAS:lyn-TagRFP) line was kindly provided by Marion Rosello and Shahad Albadri from the lab of Filippo Del Bene. We crossed the Tg(cldnb:Gal4; UAS:lyn-TagRFP) line with the TgBAC(lamC1:lamC1-sfGFP) reporter and performed live imaging on 2 embryos/4 placodes, in a frontal view. A new movie (Movie 3 in the revised article) shows examples of exit and entry point formation in this context.This allowed us to visualise the formation of the exit and entry points in more samples (6 embryos and 12 placodes in total when we pool the two strategies for labelling OP axons) and through the visualisation of a small number of axons, and reinforce our initial conclusions. 

      (3) The integrity of BMs around the brain and the OP is affected in the sly mutant 

      Why do the authors analyse the distribution of collagen IV and Nidogen and not proteoglycans and heparan sulphate? 

      We attempted to label more ECM components such as proteoglycans and heparan sulfate, but whole-mount immunostainings did not work in our hands.

      A dynamic analysis of the distribution of neural crest cells in the sly mutant over time and during OP coalescence would be important. 

      See our detailed response to this point above.  

      (4) Role of Laminin γ1-dependent BMs in OP coalescence 

      The authors use the size of the Tg(neurog1:GFP)+ OP cell cluster at 22 hpf as a marker.  The authors should count the number of cells in the OP at the indicated time using a nuclear dye to check that in the sly mutant the number of cells is the same over time. Two time points as analysed in Figure S2 may not be sufficient to quantify proliferation which at these stages should be almost zero according to Whitlock & Westerfield and Madelaine et al.

      Counting the neurog1:GFP+ cell numbers in our existing data was unfortunately impossible, due to the poor quality of the DAPI staining. We are nevertheless confident that the number of cells within neurog1:GFP+ clusters is fairly similar between controls and sly mutants at 22 hpf, since the OP dimensions are the same for AP and DV dimensions, and only slightly different for the ML dimension. In addition, we analysed proliferation and apoptosis within the neurog1:GFP+ cluster at 16 and 21 hpf and observed no difference between controls and mutants.

      (5) Role of Laminin γ1-dependent BMs during the forebrain flexure 

      In Figure 4F at 32hpf, the presence of 77% ectopic OMP+ cells medially should result in an increase in dimensions along the M-L? This is not the case in the article. The authors should clarify this point. 

      As we explained in the Material and Methods, ectopic fluorescent cells (cells that are physically separated from the main cluster) were not taken into account for the measurement of the OP dimensions. This is now also also mentioned in the legends of the Figures (4 and S3) showing the quantifications of OP dimensions.

      Cell distribution also seems to be affected within the OMP+ cluster at 36hpf, with fewer cells laterally and more medially. The authors should analyse the distribution of OMP+ cells in the clusters. in sly mutants and controls to understand whether the modification corresponds to the absence of BM function. 

      On the pictures shown in Figure 4F,G, we agree that omp:meYFP+ cells appear to be more medially distributed in the mutant, however this is not the case in other sections or samples, and is rather specific to the z-section chosen for the Figure. We found that the ML dimension is unchanged in mutants as compared with controls, except for the 28 hpf stage where it is smaller, but this appears to be a transient phenomenon, since no change is detected at earlier or later stages (Figure 4A-D and Figure S3A-L). The difference we observe at 28 hpf is now mentioned in the revised manuscript.

      The conclusions of Figures 4 and S3 would rather be that laminin allows OMP+ cells to be oriented along the medio-lateral axis whereas it would control their position along the dorsoventral axis. The authors should modify the text. It would be useful to map the distribution of OMP+ cells along the dorsoventral and mediolateral axes. The same applies to Neurog1+ cells. An analysis of skin cell movements, for example, would be useful to determine whether the effects are specific.  

      We are confident that the measurements of OP dimensions in AP, DV and ML are sufficient to describe the OP shape defects observed in the sly mutants. Analysing cell distribution along the 3 axes as well as skin cell movements will be interesting to perform in the future but we consider these quantifications as being out of the scope of the present work.

      (6) Laminin γ1-dependent BMs are required to define a robust boundary between the OP and the brain 

      The authors must weigh this conclusion "Laminin γ1-dependent BMs serve to establish a straight boundary between the brain and OP, preventing local mixing and late convergence of the two OPs towards each other during flexion movement." Indeed, they don't really show any local mixing between the brain and OP cells. They would need to quantify in their images (Figure 5A-A' and Figure S4 A-A') the percentage of cells co-labelled by HuC and Tg(cldnb:GFP). 

      We agree with the reviewer and thus replaced « reveal » by « suggest » in the conclusion of this section. 

      (7) Role of Laminin γ1-dependent BMs in olfactory axon development 

      An analysis of the retrograde extension movement in the axons of OMP+ ectopic neurons in the sly1 mutant condition would be useful to validate that the loss of laminin function does not play a role in this event. 

      Indeed, even though we can visualise instances of retrograde extension occurring normally in sly mutants, we can not rule out that this process is affected in a subset of OP neurons, for instance in ectopic cells, which often show no axon or a misoriented axon. We added a sentence to mention this in the revised manuscript.

      Minor comments and typos: 

      Please check and mention the D-V/L-M or A-P/L-M orientation of the images in all figures. 

      This has been checked.

      Legend Figure 1: "distalmost" is missing a space "distal most". 

      We checked and this word can be written without a space.

      Figure 1 panel C: check the orientation (I am not sure that Dorsal is up). 

      We double-checked and confirm that dorsal is up in this panel.

      Movie 1 Legend: "aroung "the OP should be around the OP. 

      Thanks to the reviewer for noticing the typo, we corrected it.

      Reviewer #2 (Recommendations For The Authors):

      The comments below are relatively minor and mostly raise questions regarding images and their presentation in the manuscript. 

      • Figure 1, visualization of exit and entry points: It is a bit difficult to visualize the axon exit and entry points in these images, and in particular, to understand how the exit and entry points in C and D correspond to what is seen in F, F', H, and H'. There appears to be one resolvable break in the staining in C and D, whereas there are two distinct breaks in F-H'. Are these single optical sections? Is it possible to visualize these via 3-dimensional rendering? 

      All the images presented in Figure 1 are single z-sections, which is now indicated in the Figure legend. As noticed by the reviewer, Laminin immunostainings on fixed embryos at 28 and 36 hpf suggested that the exit and entry points are facing each other, as shown in Figure 1C-D’. However, in our live imaging experiments we always observed that the exit point is slightly more ventral than the entry point (of about 10 to 20 µm). This discrepancy could be due to the fixation that precedes the immunostaining procedure, which could modify slightly the size and shape of cells/tissues. We added a sentence on this point in the text. In addition, we added new movies of the LamC1-sfGFP reporter with sparse red axonal labelling (Movie 3, see response to reviewer 1), as well as z-stacks presenting the organisation of exit and entry points in 3D (Movie 4), which should help to better illustrate the mechanisms of exit and entry point formation.

      • Movie 2, p. 6, "small interruptions of the BM were already present near the axon tips, along the ventro-medial wall of the OP." This is a bit difficult to assess since the movie seems to show at least one other small interruption in the BM in addition to the exit point, in particular, one slightly dorsal to the exit point. Was this seen in other samples, or in different optical sections? 

      Indeed the exit and entry points often appear as regions with several, small BM interruptions, rather than single holes in the BM. We now show in revised Movie 4 the two z-stacks (the merge and the single channel for green fluorescence) corresponding to the last time points of the movies showing exit and entry point formation in Movie 2, where several BM interruptions can be seen for both the exit and entry points. We had already mentioned this observation in the legend of Movie 2, and we added a sentence on this point in the main text of the revised manuscript. This is also represented for both exit and entry points in the new schematics in revised Fig. 1K and its legend. 

      • Movie 2, p. 6, "The opening of the entry point through the brain BM was concomitant with the arrival of the RFP+ axons, suggesting that the axons degrade or displace BM components to enter the brain." Similar to the questions regarding the exit point, it was a bit difficult to evaluate this statement. There appears to be a broader region of BM discontinuity more dorsal to the arrowhead in Movie 2. A single-channel movie of just the laminin fluorescence might help to convey the extent of the discontinuity. As with above, was this seen in other samples, or in different optical sections?  

      See our response to the previous comment.

      • Figure 1H, I, "the distal tip of the RFP+ axons migrated in close proximity with the brain's BM." This is again a bit difficult to see, and quite different than what is seen in Figure 4A, in which the axons do not seem close to the BM in this section. Is it possible to visualize this via 3-dimensional rendering? 

      In fixed embryos or in live imaging experiments, we observed that, once entered in the brain, the distal tips (the growth cones) of the axons are located close to the BM of the brain. However, this is not the case of the axon shafts which, as development proceeds, are located further away from the BM. This can clearly be seen at 36 hpf in Figure 1D’ and Figure 4A, as spotted by the reviewer. We modified the text to clarify this point.

      • Figure 2J, J', p. 7, the gap between the OP and brain cells of sly mutants "was most often devoid of electron-dense material." It is difficult to see this loss of electron-dense material in 2J'. The thickness of the space is quantified well and is clearly smaller, but the change in electron-dense material is more difficult to see.  

      We looked at Figure 2 again and it seems clear to us that there is electron-dense material between the plasma membranes in controls, which is practically not seen (rare spots) in the mutants. We added a sentence mentioning that we rarely see electron-dense spots in sly mutants.

      • Figure 5E-F': There are concerns about evaluating the shape of a tissue based on nuclear position. Is there a way to co-stain for cell boundaries (maybe actin?), and then quantify distortion of the dlx+ cell population using the cell boundaries, rather than nuclear staining? 

      We agree with the reviewer that it is not ideal to evaluate the shape of the OP/brain boundary based on a nuclear staining. As explained in the text, we could not use the Tg(eltC:GFP) or Tg(cldnb:Gal4; UAS:RFP) reporter lines for this analysis, due to ectopic or mosaic expression. However we are confident that the segmentation of the Dlx3b immunostaining reflects the organisation of the cells at the OP/brain tissue boundary: in other data sets in which we performed Dlx3b staining with membrane labelling independently of the present study and in the wild type context, we clearly see that cell membranes are juxtaposed to the Dlx3b nuclear staining (in other words, the cytoplasm volume of OP cells is very small). 

      • Figure S5E: It would be helpful to see representative images for each of the categories (Proper axon bundle; Ventral projections; Medial projections) or a schematic to understand how the phenotypes were assessed. 

      To address this point we added a schematic view to illustrate the phenotypes assessed in each column of the table in revised Figure S5G.

      • Figure 6, p. 12, "Laminin gamma 1-dependent BMs are essential for growth and navigation of the axons...": What fraction of the tracked axons managed to exit the OP? Given the quantitative analyses in Figure 6, one might interpret this to mean that laminin gamma 1 is not essential for axon growth (speed and persistence are largely unchanged), but rather, primarily for navigation. 

      As noticed by the reviewer, the speed and persistence of axonal growth cones are largely unchanged in the sly mutants (except for the reduced persistence in the 200-400 min window, and an increased speed in the 800-1000 min window), showing that the growth cones are still motile. However, as shown by the tracks, they tend to wander around within the OP, close to the cell bodies, which results in the end in a perturbed growth of the axons. The navigation issues are rather revealed by the analysis of fixed Tg(omp:meYFP) embryos presented in the table of Figure S5G. We modified the text to separate more clearly the conclusions of the two types of experiments (fixed, transgenic embryos versus live, mosaically labelled embryos).

      Reviewer #3 (Recommendations For The Authors):

      Testing the hypotheses mentioned in the public review will be interesting experiments for a follow-up study, but are not essential revisions for this manuscript. 

      I have only a few minor suggestions for revisions: 

      P8 subheading 'Role of Laminin γ1-dependent BMs in OP coalescence' - since no major role was demonstrated here, this heading should be reworded.  

      We agree with the reviewer and replaced the previous title by « OP coalescence still occurs in the sly mutant ».

      P11, line 3 - the authors conclude that the forebrain is smaller 'due to' the inward convergence of the OPs. I do not think it is possible to assign causation to this when the mutant disrupts Laminin γ1 systemically - it is equally possible that the OPs move inward due to a failure of the brain to form in the normal shape. Thus, the wording should be changed here. (In the Discussion on p15, the authors mention the 'apparent distortion' of the brain, and say that it is 'possibly due' to the inward migration of the placodes', but again this could be toned down.) 

      We agree with the reviewer’s comment and changed the wording of our conclusions in the Results section.

      P11 and Fig. S5 - The table and text seem to be saying opposite things here. The text on p11 (3rd paragraph) indicates that the normal exit point is ventral and that this is disrupted in the mutant, with axons exiting dorsally. However, in the table, at each time point there is a higher % of axons exiting ventrally in the mutant. Please clarify. The table does not provide a % value for axons exiting dorsally - it might help to add a column to show this value. 

      We are grateful to the reviewer for pointing this out, and we apologize for the lack of clarity in the first version of the manuscript. We have modified the text and Figure S5 in order to clarify the different points raised by the reviewer in this comment. The Table in Fig. S5G does not represent the % of axons showing defects, but the % of embryos showing the phenotypes. In addition, an embryo is counted in the ventral or medial projection category if it shows at least one ventral or medial projection (even if its shows a proper bundle). This is now clearly indicated in the title of the columns in the table itself and in the legend. The embryos in which the axons exit dorsally in sly mutants are actually those counted in the left column of the Table (they exit dorsally and form a bundle), as shown by the new schematics added below the table. We also added this information in the title of the left column, and mention in the legend the pictures in which this dorsal exit can be observed in the article (Figures 4B and S3E’). Having more sly mutant embryos with axons exiting dorsally is thus compatible with more embryos showing at least one ventral projection.

      Fig. S6, shows the lack of neural crest cells between the olfactory placode and the brain in both laminin γ1 mutants (without a basement membrane) and foxd3 mutants (which retain the membrane). Comparison of the two mutants here is a neat experiment and the result is striking, demonstrating that it is the basement membrane, and not the neural crest, that is required for correct morphology of the olfactory placode. I think this figure should be presented as a main figure, rather than supplementary.  

      Our new live imaging characterisation of NCC migration in sly mutants and control siblings (Movie 9) revealed that at 32 hpf, in the vicinity of the OP, NCC (or their derivatives) are much more numerous than the subset of NCC showing crestin expression by in situ hybridisation (compare the end of our control movie – 32 hfp, with crestin ISH shown in Figure S6A for instance). 

      Thus, the extent of the NCC migration defects should be analysed in more detail in the foxd3 mutant in the future (using live imaging or other NCC markers), and for this reason we chose to keep this dataset in the supplementary Figures.

      One of the first topics covered in the Discussion section is the potential role of Collagen. I was surprised to see the description on P15 'the dramatic disorganization of the Collagen IV pattern observed by immunofluorescence in the sly mutant', as I hadn't picked this up from the Results section of the paper. I went back to the relevant figure (Fig. 2) and description on p7, which does not give the same impression: 'in sly mutants, Collagen IV immunoreactivity was not totally abolished'. This suggested to me that there was only minor (not dramatic) disorganisation of the Collagen IV. This needs clarification.  

      The linear, BM-like Collagen IV staining was lost in sly mutants, but not the fibrous staining which remained in the form of discrete patches surrounding the OP. We modified the text in the Results section as well as in the Figure 2 legend to clarify our observations made on embryos immunostained for Collagen IV.

      Typos etc 

      P5 - '(ii) above of the neuronal rosette' - delete the word 'of'. 

      P5 two lines below this - ensheathed. 

      P10 - '3 distinct AP levels' (delete s from distincts). 

      P10 - distortion (not distorsion) . 

      P12 - 'From 14 hpf, they' should read 'From 14 hpf, neural crest cells'. 

      P15, line 1 - 'is a consequence of' rather than 'is consecutive of'? 

      P22 'When the data were not normal,' should read 'When the data were not normally distributed,'. 

      We thank the reviewer for noticing these typos and have corrected them.

      General 

      Please number lines in future manuscripts for ease of reference. 

      This has been done.

    1. Author response:

      Thank you for the positive and constructive feedback on our manuscript. We appreciate you highlighting the importance of our work advancing our understanding of the molecular etiology of acquired immunodeficiency syndrome (AIDS). To extend and further substantiate the observation that the CARD8 inflammasome is activated in response to viral protease during HIV-1 cell-to-cell transmission, we are in the process of completing additional experiments that are responsive to reviewer feedback, including:

      • Primary CD4+ T cell to monocyte-derived macrophage (MDM) transmission:  We have now repeated the cell-to-cell experiments with HIV-1 transfer from primary CD4+ T cells to primary monocyte-derived macrophages, and our findings are consistent with CARD8-dependent IL-1β release from HIV-1-infected macrophages in this more physiologic context. We are in the process of repeating these experiments with additional donors and will add these results to the revised manuscript.

      • Heterogeneity amongst blood donors: We have now repeated the cell-to-cell transfer and CARD8 knockout in MDMs with additional donors. While we continue to observe heterogeneity amongst donors, the key observation that CARD8 is require for inflammasome responses to HIV-1 infection is consistent. We note that some donors, including the one individual reported in the first submission, have markedly diminished CARD8 activity (to both HIV-1 and VbP).

      • Time course experiments: We did conduct a time course experiment when initially establishing these assays. We have now repeated these experiments with additional timepoints and in the presence or absence of the RT inhibitor nevirapine. The results of these experiments will be included in the revised manuscript.

      • The role of Gasdermin D: We are mostly interested in the release of IL-1β from the infected macrophages due to its potential contribution to myeloid-driven inflammation in PLWH. To date, there is no evidence that any other pore-forming protein other than GSDMD can initiate IL-1β release (and pyroptosis) downstream of CARD8. Nonetheless, we will attempt this experiment with the Gasdermin D inhibitor, disulfiram. 

      We believe these and other experiments will further support the importance of the CARD8 inflammasome in myeloid-driven inflammation in PLWH and look forward to submitting the revision.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors test whether the archerfish can modulate the fast response to a falling target.

      We have not tested whether archerfish can 'modulate the fast response'. We quantitatively test specific hypotheses on the rules used by the fish. For this the accuracy of the decisions is analyzed with respect to specific points that can be calculated precisely in each experiment. The ill-defined term 'modulate' does in no way capture what is done here. This assessment might explain the question, raised by the reviewer, of 'what is the difference of this study and Reinel, 2016' (i.e. Reinel and Schuster, 2016). In that study, all objects were strictly falling ballistically, and latency and accuracy of the turn decisions were determined when the initial motion was not only horizontal but had an additional vertical component of speed. The question of that study was if the need to account to an additional variable (vertical speed) in the decision would affect its latency or accuracy. The study showed that also then archerfish rapidly turn to the later impact point. It also showed that accuracy and latency (defined in this study exactly as in the present study) were not changed by the added degree of freedom. This is a completely different question and by its very nature does not leave the realm of ballistics.

      By manipulating the trajectory of the target, they claim

      that the fish can modulate the fast response.

      While it is clear from the result that the fish can modulate the fast response, the experimental support for the argument that the fish can do it for a reflex-like behavior is inadequate. 

      This is disturbing: The manuscript is full of data that directly report response latency (a parameter that's critical in all experiments) and there are even graphical displays of the distribution of latency (Figs. 2, 5). How fast the responses are, can also already be seen in the first video. Most importantly, the nature of the 40 ms limit has been discovered and has been reported by our group in 2008 (Schlegel and Schuster, 2008, Fig. 4). For easy reference, we attach Schlegel and Schuster, 2008 with the relevant passages marked in yellow. But later studies also using high speed video (ie. typically 500 fps) and simultaneously evaluating accuracy and kinematics (in the same ways as used here!) to address various questions repeatedly report and even graphically represent minimum latencies of 40 ms, e.g. Krupczynski and Schuster, 2013 (e.g. Fig. 2); Reinel and Schuster, 2014; Reinel and Schuster, 2016;  Reinel and Schuster, 2018a, b (e.g. see Fig. 7 in the first part) and report how latency is increased as urgency is decreased (if the fish are too close or time of falling is increased), as temperature is decreased or as viewing conditions and their homogeneity across the tank change. Moreover, even a field study is available (Rischawy, Blum and Schuster, 2015) that shows why the speed is needed. This is because of massive competition with at least some of the competitor fish also be able to turn to the later impact point. So, speed is an absolute necessity if competitors are around. Interestingly, when the fish are isolated, latency goes up and eventually the fish will no longer respond with C-starts (Schlegel and Schuster, 2008).

      Another aspect: considering the introduction it would not even have mattered if not 40 ms but instead 150 ms were the time needed for an accurate start (which is not the case). That would still be faster than an Olympic sprinter responds to a gun shot. Moreoever, please also note that we carefully talk of reflex-speed not of a reflex-behavior (which is as easy to verify as any other if the false statements made).

      Strengths: 

      Overall, the question that the authors raised in the manuscript is interesting. 

      Given the statement of no difference between the present study and Reinel and Schuster, 2016, it is not clear what this assessment refers to.

      Weaknesses: 

      (1) The argument that the fish can modulate reflex-like behavior relies on the claim that the archerfish makes the decision in 40 ms. There is little support for the 40 ms reaction time.

      The 'little support' is a paper in Science in which this important aspect is directly analyzed (Fig. 4 of that paper) and that has been praised by folks like Yadin Dudai (e.g . in Faculty 1000). The support is also data on latency as presented in the present paper. Furthermore, additional publications are available on the reaction time (see above).

      The reaction time for the same behavior in Schlegel 2008, is 60-70 ms, and in Tsvilling 2012 about 75 ms, if we take the half height of the maximum as the estimated reaction time in both cases. If we take the peak (or average) of the distribution as an estimation of reaction time, the reaction time is even longer. This number is critical for the analysis the authors perform since if the reaction time is longer, maybe this is not a reflex as claimed.

      See above.

      In addition, mentioning the 40 ms in the abstract is overselling the result.

      See above.

      Just for completeness: Considering a very interesting point raised by reviewer 2 we add an additional panel to further emphasize the exciting point that accuracy and latency are unrelated in the start decisions. That point was already made in Fig.4 of the paper in Science but can be directly addressed.  

      The title is also not supported by the results. 

      No: the title is clearly supported by the results that are reported in the paper.

      (2) A critical technical issue of the stimulus delivery is not clear.

      The stimulus delivery is described in detail. Most importantly we emphasize (even mentioning frame rate) that all VR setups require experimental confirmation that they work for the species and for the behavior at hand. Ideally, they should elicit the same behavior (in all aspects) as a real stimulus does that the VR approach intends to mimic. Whether VR works in a given animal and for the behavior at hand in that animal cannot be known or postulated a priori. It must be shown in direct critical experiments. Such experiments and the need to perform them are described in detail in Figure 2 and in the text that is associated with that figure.

      The frame rate is 120 FPS and the target horizontal speed can be up to 1.775 m/s. This produces a target jumping on the screen 15 mm in each frame. This is not a continuous motion. Thus, the similarity between the natural system where the target experiences ballistic trajectory and the experiment here is not clear. Ideally, another type of stimulus delivery system is needed for a project of this kind that requires fast-moving targets (e.g. Reiser, J. Neurosci.Meth. 2008).

      See above. It is quite funny that one of the authors of the present study had been involved in developing a setup with a complete panorama of 6000 LEDs (Strauss, Schuster and Götz, 1997; and appropriately cited in Reiser) that has been the basis for Reiser. This panorama was also used to successfully implement VR in freely walking Drosophila (Schuster et al., Curr. Biol., 2002). However, an LED based approach was abandoned because of insufficient spatial resolution (that, in archerfish, is very different from that of Drosophila).

      But the crucial point really is this: Just looking at Figure 2 shows that our approach could not have worked better in any way - it provided the input needed to cause turn decisions that are in all aspects just as those with real objects. Achieving this was not at all trivial and required enormous effort and many failed attempts. But it allows addressing our questions for the first time after 20 years of studying these interesting decisions.

      In addition, the screen is rectangular and not circular, so in some directions, the target vanishes earlier than others. It must produce a bias in the fish response but there is no analysis of this type. 

      Why 'must' it produce a bias? Is it not conceivable that you can only use a circular part of the screen? Briefly, and as could have been checked by quickly looking into the methods section, this is what we did. But still, why would it have mattered in our strictly randomized design? It could have mattered only in a completely silly way of running the experiments in which exclusively long trajectories are shown in one condition and exclusively short ones in another.

      (3) The results here rely on the ability to measure the error of response in the case of a virtual experiment. It is not clear how this is done since the virtual target does not fall.

      Well, of course it does not fall!!! That is the whole point that enables the study, and this is explained in connection with the glass plate experiment of Fig. 1 and quite some text is devoted to say that this is the starting point for the present analysis. The ballistic impact point is calculated (just as explained in our very first paper on the start decisions, Rossel, Corlija and Schuster, 2002) from the initial speed and height of the target, using simple high-school physics and the justification for that is also in that paper. This has been done already more than 20 years ago. How else could that paper have arrived at the conclusion that the fish turned to the virtual impact point even though nothing is falling? We also describe this for the readers of the present study, illustrate how accuracy is determined in Figures, in all videos and in an additional Supplementary Figure. Consulting the paper reveals that orientation of the fish is determined immediately at the end of stage 2 of its C-start and the error directly reports how close continuing in that direction would lead the fish to the (real or virtual) impact point. This measure has also been used since the first paper in 2002 in our lab and it is very useful because it provides an invariant measure that allows pooling all the different conditions (orientation and position of responding fish as well as direction, speed and height of target).

      How do the authors validate that the fish indeed perceives the virtual target as the falling target?

      See above. The fish produce C-starts (whose kinematics are analyzed and reported in Figures), whose latency is measured (from onset of target motion to onset of C-start) and whose accuracy in aligning them to the calculated virtual impact point is measured (see above). Additionally, the errors are also analyzed to other points of interest, for instance landmarks, the ballistic landing point in the re-trained fish or points calculated on the basis of specific hypotheses in the generalization experiments.

      Since the deflection is at a later stage of the virtual trajectory, it is not clear what is the actual physics that governs the world of the experiment.

      As explained in the text what we need is substituting the ballistic connection with another fixed relation between initial target motion and the landing point. This other relation needs to produce a large error in the aims when they remain based on the ballistic virtual landing point. It is directly shown in the key experiments that the fish need not see the deflection but can respond appropriately to the initial motion after training (Figs. 3, 5 and corresponding paragraphs in the text as well as additional movies). Please also note that after training the decision is based on the initial movement. This is shown in the interspersed experiments in which nothing than the initial (pre-deflection) movement was shown.

      Overall, the experimental setup is not well designed. 

      It is obviously designed well enough to mimic the natural situation in every aspect needed (see Fig. 2) and well enough to answer the questions we have asked.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript studies prey capture by archer fish, which observe the initial values of motion of aerial prey they made fall by spitting on them, and then rapidly turn to reach the ballistic landing point on the water surface. The question raised by the article is whether this incredibly fast decision-making process is hardwired and thus unmodifiable or can be adjusted by experience to follow a new rule, namely that the landing point is deflected from a certain amount of the expected ballistic landing point. The results show that the fish learn the new rule and use it afterward in a variety of novel situations that include height, side, and speed of the prey, and which preserve the speed of the fish's decision. Moreover, a remarkable finding presented in this work is the fact that fish that have learned to use the new rule can relearn to use the ballistic landing point for an object based on its shape (a triangle) while keeping simultaneously the 'deflected rule' for an object differing in shape (a disc); in other words, fish can master simultaneously two decision-making rules based on the different shape of objects. 

      Strengths: 

      The manuscript relies on a sophisticated and clever experimental design that allows changing the apparent landing point of a virtual prey using a virtual reality system. Several robust controls are provided to demonstrate the reliability and usefulness of the experimental setup. 

      Overall, I very much like the idea conveyed by the authors that even stimuli triggering apparently hardwired responses can be relearned in order to be associated with a different response, thus showing the impressive flexibility of circuits that are sometimes considered mediating pure reflexive responses.

      Thank you so much for this precise assessment of what we have shown!

      This is the case - as an additional example - of the main component of the Nasanov pheromone of bees (geraniol), which triggers immediate reflexive attraction and appetitive responses, and which can, nevertheless, be learned by bees in association with an electric shock so that bees end up exhibiting avoidance and the aversive response of sting extension to this odorant (1), which is a fully unnatural situation, and which shows that associative aversive learning is strong enough to override preprogrammed responding, thus reflecting an impressive behavioral flexibility. 

      That's very interesting, thanks.

      Weaknesses: 

      As a general remark, there is some information that I missed and that is mandatory in the analysis of behavioral changes. 

      Firstly, the variability in the performances displayed. The authors mentioned that the results reported come from 6 fish (which is a low sample size). How were the individual performances in terms of consistency? Were all fish equally good in adjusting/learning the new rule? How did errors vary according to individual identity? It seems to me that this kind of information should be available as the authors reported that individual fish could be recognized and tracked (see lines 620-635) and is essential for appreciating the flexibility of the system under study. 

      Secondly, the speed of the learning process is not properly explained. Admittedly, fish learn in an impressive way the new rule and even two rules simultaneously; yet, how long did they need to achieve this? In the article, Figure 2 mentions that at least 6 training stages (each defined as a block of 60 evaluated turn decisions, which actually shows that the standard term 'Training Block' would be more appropriate) were required for the fish to learn the 'deflected rule'. While this means 360 trials (turning starts), I was left with the question of how long this process lasted. How many hours, days, and weeks were needed for the fish to learn? And as mentioned above, were all fish equally fast in learning? I would appreciate explaining this very important point because learning dynamics is relevant to understanding the flexibility of the system. 

      First, it is very important to keep the question in mind that we wanted to clarify: Does the system have the potential to re-tune the decisions to other non-ballistic relations between the input variables and the output? This would have been established if one fish was found capable of doing that. However, we do have sufficient evidence to say that all six fish learned the new law and that at least one (actually four) individual was capable of simultaneously handling the two laws. We will explain this much better (hopefully) in our revised version. We also have to stress that not all archerfish might actually be able to do this and that not all archerfish might learn in the same way, at the same speed, or using the same strategies. These questions are extremely interesting and we therefore definitely will include all evidence that we have. If some individuals are better than others in quickly adjusting, then even observational learning could become a part of the story. However, we needed to make and document the first steps. Understanding these is essential and apparently is difficult enough.

      Reference: 

      (1) Roussel, E., Padie, S. & Giurfa, M. Aversive learning overcomes appetitive innate responding in honeybees. Anim Cogn 15, 135-141, doi:10.1007/s10071-011-0426-1 (2012). 

      Thanks for this reference!

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1.1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      Currently the Methods indeed explain that groups are compared by testing differences of distributions of residuals of treatment and control groups around the Deming regression of the control groups: “To test if treatments altered the relationship between initial performance vs learning or daily vs overnight learning, we compared the distribution of signed distance to the control Deming regression line between groups.” But this shall indeed be explained in more details.

      The performance on a given day depends on a cumulative process, so that the average measure of performance is not fully informative on what is learned or what is changed by a treatment (this is further explained in the text p9-10).The challenge is to deal with the multivariate relationships where initial performance, daily learning, and consolidated learning are interdependent. While in control groups these quantities show linear relationships, this is far less the case in treatment groups; this may indeed be due to the variability of the effect of the treatment (efficacy of viral injections) which adds up to the intrinsic variability in the absence of treatment.

      Our choice to see if there is a shift in these relationships following treatments, is to see to which extent treatment points in bivariate comparisons (initial perf x daily learning, daily learning x consolidated learning) are evenly distributed around the control group regression line. We take the presence of a significant difference in the distribution of residuals between the control and treatment group as an indication that the process represented in group is disrupted by the treatment: e.g. if the residuals of the treatment group are lower than those of the control group in the initial performance * daily learning comparison, it indicates that learning is slower (or larger). If the residuals of the treatment group are lower than those of the control group in the daily learning * consolidated learning comparison, it indicates that consolidation is lower. This shall be clarified in a revised version.

      (1.2a) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018). ” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). We do not claim that there is a full segregation of the two pathways, there is indeed some known degree of collateralization (see below).

      (1.2b) The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei?; how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      Actually, the study does not assume that CL-projecting and VAL-projecting neurons are entirely separate populations (actually it is known that there is an overlap), but states that inhibition of neurons following retrograde infections from the CL and VAL do not produce identical results.

      There is indeed a paragraph devoted to the discussion of this point (middle paragraph p20). “Interestingly, both Dentate and Interposed nuclei contain some neurons with collaterals in both VAL and CL thalamic structures (Aumann and Horne 1996, Sakayori, Kato et al. 2019), suggesting that the effect on learning could be mediated by a combined action on the learning process in the striatum (via the CL thalamus) and in the cortex (via the VAL thalamus). However, consistent with (Sakayori, Kato et al. 2019), we found that the manipulations of cerebellar neurons retrogradely targeted either from the CL or from the VAL produced different effects in the task. This indicates that either the distinct functional roles of VAL-projecting of CL-projecting neurons reported in our study is carried by a subset of pathway-specific neurons without collaterals, or that our retrograde infections in VAL and CL preferentially targeted different cerebello-thalamic populations even if these populations had axon terminals in both thalamic regions.”. In other words, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL retrograde infections recruit somewhat different populations of neurons. This could be due to differences in density of collaterals in CL and VAL of neurons with collaterals in both regions, or presence of CL-projecting neurons without collaterals in VAL, and VAL-projecting neurons without collaterals in CL in addition to the (established) population of neurons with collaterals in both regions. The lesional approach of CN-thalamus neurons in Sakayori et al. 2019 also observed separate effects for CL and VL injections consistent with the differential recruitment of CN populations by retrograde infections.

      This should be improved in a revised version of the manuscript.

      (1.3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      We do not have the wash data on the same day, but there is no significant change in the baseline firing rate across recording days.

      (1.4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      This shall be indeed corrected in a revised version.

      (1.5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task". I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This shall be indeed corrected in a revised version

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (2.1) While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation [+0.12 rpm per s]- in the accelerating version).

      In the CN experiments, we found clear deficits in learning and consolidation while there was no effect on the fixed speed rotarod (performance of the DREAD-CNO are even slightly better than some control groups), consistent with a separation of the effect on learning/consolidation from those on locomotion on a rotarod. However, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group; there was no significant effect in the CN-CL group, while the CN-CL actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast the CN-VAL group only showed significantly lower performance on day 4 of the accelerating rotarod consistent with intact learning abilities. Of note, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while on average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s).

      The text currently states “The inhibition of CN-VAL neurons during the task also yielded lower levels of performance in the Maintenance stage,[[NB: day 5-7]] suggesting that these neurons contribute also to learning and retrieval of motor skills, although the mild defect in fixed speed rotarod could indicate the presence of a locomotor deficit, only visible at high speed.” Following the reviewers’ comment, we shall however revise the sentence above in the revised version of the MS to say that we cannot fully disambiguate the execution / learning-retrieval effect at high speed for these mice.

      (2.2a) Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel.

      As explained above (point 1.2a), it is already known that these pathways overlap to some degree (discussion p 20), but yet their targeting differentially affects the behavior, consistent with separate contributions. A similar finding was observed for a lesional (irreversible) approach in Sakayori et al. 2019.

      (2.2b) The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      While we agree that after 3-4 days of learning the difference of performance between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible and the impact of inhibition on "learning rate" (ie. amount of learning for a given daily initial performance) and consolidation (i.e. overnight retention of daily gain of performance) exhibit different profiles for the two groups (fig 3h vs 3k).

      Reviewer #3 (Public review)

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) cerebellothalamic connections are important for learning motor skills

      (2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning

      (3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) that once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (3.1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is also discussed in point 2.1 above. In our view, the fixed speed rotarod is a control very close to the accelerating rotarod condition, with very similar requirements between the two tasks (yet unfortunately rarely tested in accelerating rotarod studies). We do not exclude the presence of motor deficits, but the main argument is that these do not suffice to explain the differences observed in the accelerating rotarod. No detectable deficit was found in the CN group while very clear deficits in learning/consolidation were observed. A mild deficit is only significant in the CN-VAL group, while the deficit is not significant in the fixed-speed rotarod for the CN-CL group which shows the strongest deficit in accelerating rotarod during the first days: e.g. on day 2, the CN-CL group is already below the control group with latencies to fall ~100s (corresponding to immediate fall at ~15rpm) while the fixed speed rotarod performances at 15s of the control and CNO-treated groups show an ability to stay more than 1 min at this speed. The text shall be improved to clarify this point.

      (3.2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      There is indeed published evidence for some degree of anatomical overlap, but also for some differential contribution of CN-VAL and CN-CL to the task. The answer to this point is developed in the points 1.2a 2.2a above. Although this point was exposed in the discussion (p20), the text shall be improved in a revised version of the MS to clarify our statement.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors successfully detected distinct mechanisms signalling prediction violations in the auditory cortex of mice. For this purpose, an auditory pure-tone local-global paradigm was presented to awake and anaesthetised mice. In awake rodents, the authors also evaluated interneuron cell types involved in responses to the interruption of the regularity imposed by local-global sequences. By performing two-photon calcium imaging and single-unit electrophysiology, the authors disentangled three phenomena underlying responses to violations of the distinct local-global regularity levels: Stimulus-specific adaptation, surprise and surprise adaptation. Both stimulus-specific adaptation and surprise-or deviant-evoked responses are observable under anaesthesia. Altogether, this work advances our understanding of distinct predictive processes computing prediction violations upon the complexity of the regularity imposed by the auditory sequence.

      Strengths:

      it is an elegant study beautifully executed.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Oddball responses are increases in sensory responses when a stimulus is encountered in an unexpected location in a sequence of predictable stimuli. There are two computational interpretations for these responses: stimulus-specific adaptation and prediction errors. In recent years, evidence has accumulated that a significant part of these sequence violation responses cannot be explained simply by stimulus-specific adaptation. The current work elegantly adds to this evidence by using a sequence paradigm based on two levels of sequence violations: "Local" sequence violations of repetitions of identical stimuli, and "global" sequence violations of stimulus sequence patterns. The authors demonstrate that both local and global sequence violation responses are found in L2/3 neurons of the mouse auditory cortex. Using sequences with different inter-stimulus intervals, they further demonstrate that these sequence violation responses cannot be explained by stimulus-specific adaption.

      Strengths:

      The work is based on a very clever use of a sequence violation paradigm (local-global paradigm) and provides convincing evidence for the interpretation that there are at least two types of sequence violation responses and that these cannot be explained by stimulus-specific adaption. Most of the conclusions are based on a large dataset, and are compelling.

      Weaknesses:

      The final part of the paper focuses on the responses of VIP and PV-positive interneurons. The responses of VIP interneurons appear somewhat variable and difficult to interpret (e.g. VIP neurons exhibit omission responses in the A block, but not the B block). The conclusions based on these data appear less solid.

      We agree with the referee that the response modulations observed in  VIP and PV-Positive interneurons are weak and variable. This is indicated in the manuscript. Probably, larger scale recordings are necessary to ascertain fully the presence and distribution of omission responses.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled "Parallel mechanisms signal a hierarchy of sequence structure violations in the auditory cortex", Jamali et al. provide evidence for cellular-level mechanisms in the auditory cortex of mice for the encoding of predictive information on different temporal and contextual scales. The study design separates more clearly than previous studies between the effects of local and global deviants and separates their respective effects on the neuronal responses clearly through the use of various contextual conditions and short and long time scales. Further, it identifies a contribution from a small set of VIP interneurons to the detection of omitted sounds, and shows the influence of isofluorane anesthesia on the neural responses.

      Strengths:

      (1) The study provides a rather encompassing set of experimental techniques to study the cellular level responses, using two complementary recording techniques in the same animal and similar cortical location.

      (2) Comparison between awake and anesthetized states are conducted in the same animals, which allows for rather a direct comparison of populations under different conditions, thus reducing sampling variability.

      (3) The set of paradigms is well developed and specifically chosen to provide appropriate and meaningful controls/comparisons, which were missing from previous studies.

      (4) The addition of cell-type specific recordings is valuable and in particular in combination with the contrast of awake and anesthetized animals provides novel insights into the cellular level representation of deviant signals, such as surprise, prediction error, and general adaptation.

      (5) The analysis and presentation of the data are clear and quite complete, yet remain succinct and perform insightful contrasts.

      (6) The study will have an impact on multiple levels, as it introduces important variations in the paradigm and analytical contrasts that both human and animal researchers can pick up and improve their studies. The cell-type-specific results are particularly intriguing, although these would likely require replication before being completely reliable. Further, the study provides a substantial and diverse dataset that others can explore.

      Weaknesses:

      (1) The responses from cells recorded via Neuropixel and 2p differ qualitatively, as noted by the authors, with NP-recorded cells showing much more inhibited/reduced responses between acoustic stimulations. The authors briefly qualify these differences as potentially indicating a sampling issue, however, this matter deserves more detailed consideration in my opinion. Specifically, the authors could try to compare the different depths at which these neurons were sampled or relate the locations in the cortex to each other (as the Neuropixel recordings were collected in the same animals, a subset of the 2p recordings could be compared to the Neuropixel recordings.).

      We agree with the referee that the sampling issue, which we propose as a possible explanation for the large difference between our Neuropixel and 2P imaging recordings, must be investigated more thoroughly. This is, however, largely outside of the scope of this study. We have reported the depth and location of Neuropixel recordings in Figure S2. The depth is similar for both techniques covering mostly layers 2, 3 and 4. The location spans mostly the primary auditory cortex for two photon imaging and Neuropixel recordings. One Neuropixel recording is located in the ventral secondary auditory cortex. We could not find any evidence that the response to global violations in Neuropixel data stems specifically from this particular recording. 

      (2) The current study did not monitor the attentional state of the mouse in relation to the stimulus by either including a behavioral component or pupil monitoring, which could influence the neural responses to deviant stimuli and omissions.

      As reported by Bekinschtein et al. 2009, the attentional state influences responses to global violation in human subjects. It is extremely difficult to precisely compare attentional states in mice and human subjects. We have performed recordings in mice that had to attend to sound to detect a white noise sound in between the sequence to obtain a reward. This did not lead to increased global violation response. However, as the sequence themselves did not predict reward in this context they may divert attention. Therefore, this result is inconclusive and not worth including in our manuscript. If the sequence predicts rewards, there is a potential confound between violation responses and reward expectations or motor preparation signals. Pupil monitoring could be an alternative which we did not investigate.

      (3) Given the complexity and variety of the paradigms, conditions, and analyzed cell-types, the manuscript could profit from a more visual summary figure that provides an easy-to-access overview of what was found.

      This is an excellent suggestion, although given the complexity and diversity of our observations it may be hard to fit everything in one understandable figure.

    1. Author response:

      We appreciate the insightful comments and suggestions, which will significantly improve our work. We will revise the manuscript to address the reviewer’s concerns. Here, we list some of the key aspects of those concerns and our preliminary plans to address them.

      Both reviewers pointed out that we did not sufficiently justify the chosen optogenetic stimulation frequencies. We acknowledge and concur with their assessment, and will discuss it more extensively from a biological perspective (e.g., the neural firing rates in the olfactory bulb, OB, anterior olfactory nucleus, AON, and piriform cortex, Pir, under natural odor stimulation and respiration rhythm). Reviewer #1 suggested using beta values (b) rather than the area under the BOLD signal profile (AUC) to quantify the fMRI activations as they are more conventional for general linear model (GLM) analysis. We are aware of b and have used them for quantification of the amplitude of fMRI activations in our previous rodent fMRI studies1-3. However, in this study, we chose to utilize AUC as it offers a more comprehensive measure of BOLD signal change over time, including shape, duration, and magnitude, thereby capturing the bulk of neural activities and their dynamics throughout the stimulation period. b primarily represents the peak amplitude of BOLD responses (i.e., the % BOLD signal change)4 and can be constrained by the assumptions and limitations of the GLM analysis, such as the shape of the hemodynamic response function (HRF). AUC provides greater flexibility in capturing different aspects of neural responses across various brain regions, such as transient peaks and sustained responses.

      As mentioned by reviewer #1, correlating the adaptation of BOLD and electrophysiology signals at the brain region level would better signify our findings. We will pursue additional analysis to address this in our forthcoming responses. Reviewer #2 would like us to clarify the image and signal quality of our echo planar imaging (EPI)-based fMRI data, especially in the regions close to the air-tissue interface such as OB, Pir, entorhinal cortex and amygdala, and the methodology for some of the experimental protocols implemented in our study. We will show the raw EPI fMRI images from a representative animal and revise the results, discussion, and methods sections of the manuscript to address reviewer #2's concerns.

      In our forthcoming detailed responses to the reviewers' comments and recommendations, we will revise the text, figures, and captions accordingly to address and clarify the questions brought up by both reviewers.

      References

      (1) Gao, P.P., Zhang, J.W., Chan, R.W., Leong, A.T.L. & Wu, E.X. BOLD fMRI study of ultrahigh frequency encoding in the inferior colliculus. Neuroimage 114, 427-437 (2015).

      (2) Leong, A.T.L., Wong, E.C., Wang, X. & Wu, E.X. Hippocampus Modulates Vocalizations Responses at Early Auditory Centers. Neuroimage 270, 119943 (2023).

      (3) Gao, P.P., Zhang, J.W., Fan, S.J., Sanes, D.H. & Wu, E.X. Auditory midbrain processing is differentially modulated by auditory and visual cortices: An auditory fMRI study. Neuroimage 123, 22-32 (2015).

      (4) Goddard, E. & Mullen, K.T. fMRI representational similarity analysis reveals graded preferences for chromatic and achromatic stimulus contrast across human visual cortex. Neuroimage 215, 116780 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of Drosophila EM connectomes has revealed numerous neurons within the associative learning circuit. However, these neurons are inaccessible for functional assessment or genetic manipulation in the absence of cell-type-specific drivers. Addressing this knowledge gap, Shuai et al. have screened over 4000 split-GAL4 drivers and correlated them with identified neuron types from the "Hemibrain" EM connectome by matching light microscopy images to neuronal shapes defined by EM. They successfully generated over 800 split-GAL4 drivers and 22 split-LexA drivers covering a substantial number of neuron types across layers of the mushroom body associative learning circuit. They provide new labeling tools for olfactory and non-olfactory sensory inputs to the mushroom body; interneurons connected with dopaminergic neurons and/or mushroom body output neurons; potential reinforcement sensory neurons; and expanded coverage of intrinsic mushroom body neurons. Furthermore, the authors have optimized the GR64f-GAL4 driver into a sugar sensory neuron-specific split-GAL4 driver and functionally validated it as providing a robust optogenetic substitute for sugar reward. Additionally, a driver for putative nociceptive ascending neurons, potentially serving as optogenetic negative reinforcement, is characterized by optogenetic avoidance behavior. The authors also use their very large dataset of neuronal anatomies, covering many example neurons from many brains, to identify neuron instances with atypical morphology. They find many examples of mushroom body neurons with altered neuronal numbers or mistargeting of dendrites or axons and estimate that 1-3% of neurons in each brain may have anatomic peculiarities or malformations. Significantly, the study systematically assesses the individualized existence of MBON08 for the first time. This neuron is a variant shape that sometimes occurs instead of one of two copies of MBON09, and this variation is more common than that in other neuronal classes: 75% of hemispheres have two MBON09's, and 25% have one MBON09 and one MBON08. These newly developed drivers not only expand the repertoire for genetic manipulation of mushroom body-related neurons but also empower researchers to investigate the functions of circuit motifs identified from the connectomes. The authors generously make these flies available to the public. In the foreseeable future, the tools generated in this study will allow important advances in the understanding of learning and memory in Drosophila.

      Strengths:

      (1) After decades of dedicated research on the mushroom body, a consensus has been established that the release of dopamine from DANs modulates the weights of connections between KCs and MBONs. This process updates the association between sensory information and behavioral responses. However, understanding how the unconditioned stimulus is conveyed from sensory neurons to DANs, and the interactions of MBON outputs with innate responses to sensory context remains less clear due to the developmental and anatomic diversity of MBONs and DANs. Additionally, the recurrent connections between MBONs and DANs are reported to be critical for learning. The characterization of split-GAL4 drivers for 30 major interneurons connected with DANs and/or MBONs in this study will significantly contribute to our understanding of recurrent connections in mushroom body function.

      (2) Optogenetic substitutes for real unconditioned stimuli (such as sugar taste or electric shock) are sometimes easier to implement in behavioral assays due to the spatial and temporal specificity with which optogenetic activation can be induced. GR64f-GAL4 has been widely used in the field to activate sugar sensory neurons and mimic sugar reward. However, the authors demonstrate that GR64f-GAL4 drives expression in other neurons not necessary for sugar reward, and the potential activation of these neurons could introduce confounds into training, impairing training efficiency. To address this issue, the authors have elaborated on a series of intersectional drivers with GR64f-GAL4 to dissect subsets of labeled neurons. This approach successfully identified a more specific sugar sensory neuron driver, SS87269, which consistently exhibited optimal training performance and triggered ethologically relevant local searching behaviors. This newly characterized line could serve as an optimized optogenetic tool for sugar reward in future studies.

      (3) MBON08 was first reported by Aso et al. 2014, exhibiting dendritic arborization into both ipsilateral and contralateral γ3 compartments. However, this neuron could not be identified in the previously published Drosophila brain connectomes. In the present study, the existence of MBON08 is confirmed, occurring in one hemisphere of 35% of imaged flies. In brains where MBON08 is present, its dendrite arborization disjointly shares contralateral γ3 compartments with MBON09. This remarkable phenotype potentially serves as a valuable resource for understanding the stochasticity of neurodevelopment and the molecular mechanisms underlying mushroom body lobe compartment formation.

      Weaknesses:

      There are some minor weaknesses in the paper that can be clarified:

      (1) In Figure 8, the authors trained flies with a 20s, weak optogenetic conditioning first, followed by a 60s, strong optogenetic conditioning. The rationale for using this training paradigm is not explicitly provided.

      These experiments were designed to test if flies could maintain consistent performance with repetitive and intense LED activation, which is essential for experiments involving long training protocols or coactivation of other neurons inside a brain.

      In Figure 8E, if data for training with GR64f-GAL4 using the same paradigm is available, it would be beneficial for readers to compare the learning performance using newly generated split-GAL4 lines with the original GR64f-GAL4, which has been used in many previous research studies. It is noteworthy that in previously published work, repeating training test sessions typically leads to an increase in learning performance in discrimination assays. However, this augmentation is not observed in any of the split-GAL4 lines presented in Figure 8E. The authors may need to discuss possible reasons for this.

      As the reviewer pointed out, many previous studies including ours used the original Gr64f-GAL4 in olfactory conditioning. Figure 1H of Yamada et al., 2023 (https://doi.org/10.7554/eLife.79042) showed such a result, where the first and second-order olfactory conditioning were assayed. Indeed, the first-order conditioning scores were gradually augmented over repeated training. In this experiment, we used low red LED intensity for the optogenetic activation. In the Figure 8E of the present paper, the first memory test was after 3x pairing of 20s odor with five 1s red LED without intermediate tests. Therefore, flies were already sufficiently trained to show a plateau memory level in “Test1”. In the revision of another recent report (Figure 1C-F of Aso et al., 2023; https://doi.org/10.7554/eLife.85756), we included the learning curve data of our best Gr64f-split-GAL4, SS87269. Under a less saturated training conditioning, SS87269 did show learning augmentation over repeated training.

      (2) In line 327, the authors state that in all samples, the β'1 compartment is arborized by MBON09. However, in Figure 11J, the probability of having at least one β'1 compartment not arborized is inferred to be 2%. The authors should address and clarify this conflict in the text to avoid misunderstanding.

      The chance of visualizing MBON08 in MCFO images was 21/209 in total (Figure 11I). If we assume that each of four cells adopt MBON08 development fate at this chance, we can calculate the probability for each case of MBON08/09 cell type composition. From this calculation, we inferred approximately 2% of flies would lack innervations to β'1 compartment in at least one hemisphere. However, we didn't observe a lack of β'1 arborizations in 169 sample flies. If these MBONs independently develop into MBON08 at 21/209 odds, the chance of never observing two MBON08s in either hemisphere of all 169 samples is 3.29%. Therefore, some developmental mechanisms may prevent the emergence of two MBON08 in the same hemisphere.

      In the revised manuscript, we displayed these estimated probability for each case separately, and annotated actual observation on the right side.

      (3) In general, are the samples presented male or female? This sample metadata will be shown when the images are deposited in FlyLight, but it would be useful in the context of this manuscript to describe in the methods whether animals are all one sex or mixed sex, and in some example images (e.g. mAL3A) to note whether the sample is male or female.

      The samples presented in this study are mixed sex, except for Figure 11I, where genders are specified. We provided metadata information of the presented images in Supplemental File 7, and we added a paragraph in the in the method section:

      “Most samples were collected from females, though typically at least one male fly was examined for each driver line. While we noticed certain lines such as SS48900, exhibited distinct expression patterns in females and males, we did not particularly focus on sexual dimorphism, which is analyzed elsewhere (Meissner et al. 2024). Therefore, unless stated otherwise, the presented samples are of mixed gender.

      Detailed metadata, including gender information and the reporter used, can be found in Supplementary File 7.”

      Reviewer #2 (Public Review):

      Summary:

      The article by Shuai et al. describes a comprehensive collection of over 800 split-GAL4 and split-LexA drivers, covering approximately 300 cell types in Drosophila, aimed at advancing the understanding of associative learning. The mushroom body (MB) in the insect brain is central to associative learning, with Kenyon cells (KCs) as primary intrinsic neurons and dopaminergic neurons (DANs) and MB output neurons (MBONs) forming compartmental zones for memory storage and behavior modulation. This study focuses on characterizing sensory input as well as direct upstream connections to the MB both anatomically and, to some extent, behaviorally. Genetic access to specific, sparsely expressed cell types is crucial for investigating the impact of single cells on computational and functional aspects within the circuitry. As such, this new and extensive collection significantly extends the range of targeted cell types related to the MB and will be an outstanding resource to elucidate MB-related processes in the future.

      Strengths:

      The work by Shuai et al. provides novel and essential resources to study MB-related processes and beyond. The resulting tools are publicly available and, together with the linked information, will be foundational for many future studies. The importance and impact of this tool development approach, along with previous ones, for the field cannot be overstated. One of many interesting aspects arises from the anatomical analysis of cell types that are less stereotypical across flies. These discoveries might open new avenues for future investigations into how such asymmetry and individuality arise from development and other factors, and how it impacts the computations performed by the circuitry that contains these elements.

      Weaknesses:

      Providing such an array of tools leaves little to complain about. However, despite the comprehensive genetic access to diverse sensory pathways and MB-connected cell types, the manuscript could be improved by discussing its limitations. For example, the projection neurons from the visual system seem to be underrepresented in the tools produced (or almost absent). A discussion of these omissions could help prevent misunderstandings.

      We internally distributed efforts to produce split-GAL4 lines at Janelia Research Campus. The recent preprint (Nern et al., 2024; doi: https://doi.org/10.1101/2024.04.16.589741) described the full collection of split-GAL4 driver lines in the optic lobe including the visual projection neurons to the mushroom body. We cited this preprint in the revised manuscript by adding a short paragraph of discussion.

      “Although less abundant than the olfactory input, the MB also receives visual information from the visual projection neurons (VPNs) that originate in the medulla and lobula and are targeted to the accessory calyx (Vogt et al. 2016; Li et al. 2020). A recent preprint described the full collection of split-GAL4 driver lines in the optic lobe, which includes the VPNs to the MB (Nern et al. 2024).”

      Additionally, more details on the screening process, particularly the selection of candidate split halves and stable split-GAL4 lines, would provide valuable insights into the methodology and the collection's completeness.

      The details of our split-GAL4 design and screening procedures were described in previous studies (Aso et al., 2014; Dolan et al., 2019). Available data and tools to design split-GAL4 changed over time, and we took different approaches accordingly. Many of split-GAL4 lines presented in this study were designed and screened in parallel to the lines for MBONs and DANs in 2010-2014 when MCFO images of GAL4 drivers and EM connectome were not yet available. With knowledge of where MBONs and DANs project, I (Y.A.) manually examined and annotated thousands of confocal stacks (Jenett et al., 2012; https://doi.org/10.1016/j.celrep.2012.09.011) to find candidate cell types that may concat with them.

      Later I used more advanced computational tools (Otsuna et al., 2018; doi: https://doi.org/10.1101/318006) and MCFO images aligned to the standard brain volume (Meissner et al., 2023; DOI: 10.7554/eLife.80660.). Now, if one needs to further generate split-GAL4 lines for cell type identified in EM connectome data, neuron bridge website (https://neuronbridge.janelia.org/) can be very helpful to provide a list of GAL4 drivers that may label the neuron of interest.

      Reviewer #3 (Public Review):

      Summary:

      Previous research on the Drosophila mushroom body (MB) has made this structure the best-understood example of an associative memory center in the animal kingdom. This is in no small part due to the generation of cell-type specific driver lines that have allowed consistent and reproducible genetic access to many of the MB's component neurons. The manuscript by Shuai et al. now vastly extends the number of driver lines available to researchers interested in studying learning and memory circuits in the fly. It is an 800-plus collection of new cell-type specific drivers target neurons that either provide input (direct or indirect) to MB neurons or that receive output from them. Many of the new drivers target neurons in sensory pathways that convey conditioned and unconditioned stimuli to the MB. Most drivers are exquisitely selective, and researchers will benefit from the fact that whenever possible, the authors have identified the targeted cell types within the Drosophila connectome. Driver expression patterns are beautifully documented and are publicly available through the Janelia Research Campus's Flylight database where full imaging results can be accessed. Overall, the manuscript significantly augments the number of cell type-specific driver lines available to the Drosophila research community for investigating the cellular mechanisms underlying learning and memory in the fly. Many of the lines will also be useful in dissecting the function of the neural circuits that mediate sensorimotor circuits.

      Strengths:

      The manuscript represents a huge amount of careful work and leverages numerous important developments from the last several years. These include the thousands of recently generated split-Gal4 lines at Janelia and the computational tools for pairing them to make exquisitely specific targeting reagents. In addition, the manuscript takes full advantage of the recently released Drosophila connectomes. Driver expression patterns are beautifully illustrated side-by-side with corresponding skeletonized neurons reconstructed by EM. A comprehensive table of the new lines, their split-Gal4 components, their neuronal targets, and other valuable information will make this collection eminently useful to end-users. In addition to the anatomical characterization, the manuscript also illustrates the functional utility of the new lines in optogenetic experiments. In one example, the authors identify a specific subset of sugar reward neurons that robustly promotes associative learning.

      Weaknesses:

      While the manuscript succeeds in making a mass of descriptive detail quite accessible to the reader, the way the collection is initially described - and the new lines categorized - in the text is sometimes confusing. Most of the details can be found elsewhere, but it would be useful to know how many of the lines are being presented for the first time and have not been previously introduced in other publications/contexts.

      We revised the text as below.

      “Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      And where can the lines be found at Flylight? Are they listed as one collection or as many?

      They are listed as one collection - “Aso 2021” release. It is named “2021” because we released the images and started sharing lines in December of 2021 without a descriptive paper. We added a sentence in the Methods section.

      “All splitGAL4 lines can be found at flylight database under “Aso 2021” release, and fly strains can be requested from Janelia or the Bloomington stock center.”

      Also, the authors say that some of the lines were included in the collection despite not necessarily targeting the intended type of neuron (presumably one that is involved in learning and memory). What percentage of the collection falls into this category?

      We do not have a good record of split-GAL4 screening to calculate the chance to intersect unintended cell types, but it was rather rare. Those unintended cell types can still be a part of circuits for associative learning (e.g. olfactory projection neurons) or totally unrelated cell types. For instance, among a new collection of split-LexA lines using Gr43a-LexADBD hemidriver (Figure 7-figure supplement 2), one line specifically intersected T1 neurons in the optic lobe despite that the AD line was selected to intersect sugar sensory neurons. We suspect that this is due to ectopic expression of Gr43a-LexADBD. Nonetheless, we included it in the paper because cell-type-specific Split-LexA driver for T1 will be useful irrespective of whether the expression of Gr43a gene is expressed in T1 or not.

      And what about the lines that the authors say they included in the collection despite a lack of specificity? How many lines does this represent?

      For a short answer, there are about 100 lines in the collection that lack the specificity for behavioral experiments.

      We ranked specificity of split-GAL4 drivers in the Supplementary File 1. Rank 2 are the ideal lines, Rank 1 are less ideal but acceptable, and Rank 0 is not suitable for activation screening in behavioral experiments. Out of the 828 split-GAL4 lines reported here, there are 413, 305 and 103 lines in rank2, rank1 and rank0 categories respectively. 7 lines are not ranked for specificity because only flipout expression data are available.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      As mentioned elsewhere and in addition to the minor points below, it is advisable for the authors to elaborate on the details of the screening process. Furthermore, a discussion about the circuits not targeted by their research, such as the visual projection neurons, would be beneficial.

      See the response above to Reviewer #2’s public review.

      Line 32-33: The citations are very fly-centric. the authors might want to consider reviews on the MB of other insect species regarding learning and memory.

      We additionally cited Rybak and Menzel 2017’s book chapter on honey bee mushroom body.

      Line 43-44: Citations should be added, e.g. Séjourné et al. (2011), Pai et al. (2013), Plaçais et al. (2013).

      Citation added

      Line 50-52: Citation Hulse et al. (2021) should be added.

      Citation added

      Line 162: In this part, it might be valuable for the reader to understand which of these PNs are actually connecting with KCs.

      A full list of cell types within the MB were provided in Supplementary File 4 of the revised manuscript. See also response to Reviewer 3, Lines 150-1.

      Line 179: Citation Burke et al. (2012) should be mentioned.

      Citation added

      Line 181: Thermogenic might be thermogenetic.

      Corrected

      Line 189: Citations add Otto et al. (2020) and Felsenberg et al. (2018).

      Citations added

      Line 208ff: The authors should consider discussing why they did not use other GR and IR promoters. For example, Gr5a is prominent in sugar-sensing, while Ir76b could be a reinforcement signal related to yeast food (Steck et al., 2018; Ganguly et al., 2017; see also Corfas et al., 2019 for local search).

      We focused on the Gr64f promoter because of its relatively broad expression and successful use of Gr64f-GAL4 for fictive reward experiment. We added the Split-LexA lines with Gr43a and Gr66a promoters (Figure 7-figure supplement 2). Other gustatory sensory neurons also have the potential to be reinforcement signals, but we just did not have the bandwidth to cover them all.

      Line 319: Consider citing Linneweber et al. (2020) for a neurodevelopmental account of such individuality.

      We added a sentence and cited this reference.

      “On the other hand, the neurodevelopmental origin of neuronal morphology appeared to have functional significance on behavioral individuality (Linneweber et al. 2020).”

      Line 352: Citation add Hulse et al. (2021).

      Citations added

      Line 356ff: The utility and value of Split-LexA may not be apparent to non-expert readers. Moreover, how were LexADBDs chosen for creating these lines?

      We have added an introductory sentence at the beginning of the paragraph and explained that these split-LexA lines were a conversion of split-GAL4 lines that were published in 2014 and frequently used in studying the mushroom body circuit.

      “Split-GAL4 lines enable cell-type-specific manipulation, but some experiments require independent manipulation of two cell types. Split-GAL4 lines can be converted into split-LexA lines by replacing the GAL4 DNA binding domain with that of LexA (Ting et al., 2011). To broaden the utility of the split-GAL4 lines that have been frequently used since the publication in 2014 (Aso et al., 2014a), we have generated over 20 LexADBD lines to test the conversions of split-GAL4 to split-LexA. The majority (22 out of 34) of the resulting split-LexA lines exhibited very similar expression patterns to their corresponding original split-GAL4 lines (Figure 12).”

      Line 374: Italicize Drosophila melanogaster.

      Revised as suggested.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      As mentioned in the Public Review, the drivers are nicely classified in the various subsections of the manuscript, but the statements in the text summarizing how many lines there are in specific categories are often confusing. For example, line 129 refers to "drivers encompassing 111 cell types that connect with the DANs and MBONs", but Figure 1E indicates that 46 new cell types downstream of MBONs and upstream of DANs have been generated. This seems like a discrepancy.

      The 46 cell types in Figure 1E consider only the CRE/SMP/SIP/SLP area, where MBON downstreams and DAN upstreams are highly enriched, while the 111 cell types include all. To avoid confusion, we removed the “MBON downstream and DAN upstream” counting in Figure 1E in the revised manuscript.

      Also, at line 75 the MBON lines previously generated by Rubin and Aso (2023) are referred to as though they are separate from the 828 described "In this report." Supplementary file 1 suggests, however, that they are included as part of this report.

      Twenty five lines generated in Rubin and Aso (2023) were initially included in Supplementary file 1 for the convenience of users, but they were not counted towards the 828 new lines described in this report. To avoid confusion, we removed these 25 lines in the revised manuscript. Now all lines listed in Supplementary file 1 were generated in this study (“Aso 2021” release), and if a line has been used in earlier studies, or introduced in other contexts, for example the accompanying omnibus preprint (Meissener 2024, doi: 10.1101/2024.01.09.574419), the citations are listed in the reference column.

      More generally, in lines 94-102 "828 useful lines based on their specificity, intensity and non-redundancy" are referred to, but they are subsequently subdivided into categories of lines with lower specificity (i.e. with off-target expression) and lines that did not target intended cell types (presumably ones unlikely to be involved in learning and memory). It would be useful to know how many lines (at least roughly) fall into these subcategories.

      See the response above to Reviewer #3’s public review.

      Finally, Figures 3B & C indicate cell types connected to DANs and MBONs and the number for which Split-Gal4 lines are available. The text (lines 136-7) states that the new collection covers 30 of these major cell types (Figure 3C)," but Figure 3C clearly has more than 30 dots showing the drivers available. Presumably existing and new driver lines are being pooled, but this should either be explained or the two should be distinguished.

      “(Figure 3C)” was replaced with “(Supplementaryl File 3)” in the revised manuscript to correct the reference. Figure 3B & C are plots of all MB interneurons, not just the major cell types.

      Minor Comments:

      Although the paper is generally well written there are minor grammatical errors throughout (e.g. dropped articles, odd constructions, etc.) that somewhat detract from an otherwise smooth and enjoyable reading experience. A quick editing pass by a native speaker (i.e. any of several of the authors) could clean up these and numerous other small mistakes. A few examples: line 138 "presented" should be present; line 204: "contain off-targeted expressions" should be "have off-target expression;" line 219: "usage to substitute reward" is awkward at best and could be something like "use in generating fictive rewards"; line 326 "arborize[s]"; l. 331 "Based on the likelihood" should be something like "based on these observations"'; line 349 "[is] likely to appear"; l. 352 "extensive connection[s]"; line 353 "has [a] strong influence;" l. 963 "Projections" should be singular; etc.

      All the mentioned examples have been corrected, and we have asked a native speaker to edit through the revised manuscript.

      Lines 81-3: Is the lookup table referred to Suppl. File 1? A reference is desirable.

      Yes, the lookup table referred to “Supplementary File 1” and a reference was added.

      Lines 111-2: what is a "non-redundant set of...cell types?" Cell types that are represented by a single cell (or bilateral pair)? Or does this sentence mean that of the 828 lines, 355 are specific to a single cell type, and in total 319 cell types are targeted? The statement is confusing.

      We revised the text as below.

      “Figure 1E provides an overview of the categories of covered cell types. Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al.,

      2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      Line 148: "MB major interneurons" is a confusing descriptor for postsynaptic partners of MBONs.

      We added a sentence to clarify the definition of the “MB major interneurons”.

      “In the hemibrain EM connectome, there are about 400 interneuron cell types that have over 100 total synaptic inputs from MBONs and/or synaptic outputs to DANs. Our newly developed collection of split-GAL4 drivers covers 30 types of these ‘major interneurons’ of the MB (Supplementary File 3).”

      Lines 150-1: Not sure what is meant by "have innervations within the MB." Sounds like cells are presynaptic to KCs, DANS, and MBONs, but Figure 3 Figure Supplement 1 indicates they include neurons that both provide and receive innervation to/from MB neurons. Please clarify.

      For clarification, in the revised manuscript we have included a full list of cell types within the MB in Supplementary File 4. Included are all neurons with >= 50 pre-synaptic connections or with >=250 post-synaptic connections in the MB roi in the hemibrain (excluding the accessory calyx). The cell types include KCs, MBONs, DANs, PNs, and a few other cell types. The coverage ratio was updated based on this list.

      Also, in line 152, what does it mean that they "may have been overlooked previously?" this seems unnecessarily ambiguous. Were they overlooked or weren't they?

      Changed the text to “These lines offer valuable tools to study cell types that previously are not genetically accessible. Notably, SS85572 enables the functional study of LHMB1, which forms a rare direct pathway from the calyx and the lateral horn (LH) to the MB lobes (Bates et al., 2020). ”

      Line 158 refers to PN cells within the MB, which are not mentioned in any place else as MB components.

      What are these PNs and how do they differ from MBONs?

      See responses to Lines 150-1 for clarification of cell types within the MB.

      Line 188: not clear what is meant by "more continual learning tasks".

      We rephrase it as “more complex learning tasks” to avoid jargon.

      Line 235: Not clear why "extended training with high LED intensity" wouldn't promote the formation of robust memories. Is this for some reason unexpected based on previous experiments? Please explain.

      See responses to weakness #1 of the same reviewer

      Lines 317-9: It would be useful to state here that MB0N08 and MB0N09 are the two neurons labeled by MB083C.

      Revised as suggested.

      Line 368: Presumably the "lookup table" referred to is Supplementary File 1, but a reference here would be useful.

      Yes, Supplementary File 1 and a reference was added.

      Comments on Figures:

      Figure 1C The "Dopamine Neurons" label position doesn't align with the Punishment and Reward labels, which is a bit confusing.

      They are intentionally not aligned, because dopamine neurons are not reward/punishment per se. We intend to use the schematic to show that the punishment and reward are conveyed to the MB through the dopamine neuron layer, just as the output from the MB output neuron layer is used to guide further integration and actions. To keep the labels of “Dopamine neurons” and “MB Output Neurons” in a symmetrical position, we decide to keep the original figure unchanged. But we thank the reviewer for the kind suggestion.

      Figure 1F and Figure 1 - Figure Supplement 1: the light gray labels presumably indicate the (EM-identified) neuron labeled by each line, but this should be explicitly stated in the figure legends. It would also be useful in the legends to direct the reader to the key (Supplementary File 1) for decoding neuronal identities.

      Revised as suggested.

      Figure 2: For clarity, I'd recommend titling this figure "LM-EM Match of the CRE011-specific driver SS45245". This reduces the confusion of mixing and matching the driver and cell-type names. Also, it would be helpful to indicate (e.g. with labels above the figure parts) that A & B represent the MCFO characterization step and C & D represent the LM-EM matching step of the pipeline. Revised as suggested.

      Figure 6: For clarity, it would be useful to separately label the PN and sensory neuron groups. Also, for the sensory neurons at the bottom, what is the distinction between the cell names in gray and black font?

      Figure 6 was updated to separate the non-olfactory PN and sensory neuron groups. The gray was intended for olfactory receptor neuron cell types that are additionally labeled in the driver lines. To avoid confusion, the gray cell types were removed in the revised figure, and a clarification sentence was added to the legend.

      “Other than thermo-/hygro-sensory receptor neurons (TRNs and HRNs), SS00560 and MB408B also label olfactory receptor neurons (ORNs): ORN_VL2p and ORN_VC5 for SS00560, ORN_VL1 and ORN_VC5 for MB408B.”

      Figure 7A: It's unclear why the creation of 6 Gr64f-LexADBD lines is reported. Aren't all these lines the same? If not, an explanation would be useful.

      These six Gr64f-LexADBD lines are with different insertion sites, and with the presence or absence of the p10 translational enhancer. Explanation was added to legend. Enhanced expression level with p10 can be helpful to compensate for the general tendency that split-LexA is weaker than split-GAL4. Different insertions will be useful to avoid transvections with split-GAL4s, which are mostly in attP40 and attP2.

      Figure 8F: It would help to include in the legend a brief description of each parameter being measured-essentially defining the y-axis label on the graphs as in Figure Supplement 2. Also, how is the probability of return calculated and what behavioral parameter does the change of curvature refer to?

      We added a brief description to the behavioral parameters in the legend of Figure 8F.

      “Return behavior was assessed within a 15-second time window. The probability of return (P return) is the percentage of flies that made an excursion (>10 mm) and then returned to within 3 mm of their initial position. Curvature is the ratio of angular velocity to walking speed.”

      Figure 9E: What are the parenthetical labels for lines SS49267, SS49300, and SS35008?

      They are EM bodyIDs. Figure legend was revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study compiles a wide range of results on the connectivity, stimulus selectivity, and potential role of the claustrum in sensory behavior. While most of the connectivity results confirm earlier studies, this valuable work provides incomplete evidence that the claustrum responds to multimodal stimuli and that local connectivity is reduced across cells that have similar long-range connectivity. The conclusions drawn from the behavioral results are weakened by the animals' poor performance on the designed task.This study has the potential to be of interest to neuroscientists.

      We thank the editor and the reviewers for their feedback on our work, which we have incorporated to help improve interpretation of our findings as outlined in the response below. While we agree with the editor that further work is necessary to provide a comprehensive understanding of claustrum circuitry and activity, this is true of most scientific endeavors and therefore we feel that describing this work as “incomplete” unfairly mischaracterizes the intent of the experiments performed which provide fundamental insights into this poorly understood brain region. Additionally, as identified in the main text, methods section, and our responses to the comments below, we disagree that the behavioral results are “weakened” by the performance of the animals. Our goal was to assess what information animals learned and used in an ambiguous sensory/reward environment, not to shape them toward a particular behavior and interpret the results solely based on their accuracy in performing the task.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper by Shelton et al investigates some of the anatomical and physiological properties of the mouse claustrum. First, they characterize the intrinsic properties of claustrum excitatory and inhibitory neurons and determine how these different claustrum neurons receive input from different cortical regions. Next, they perform in vitro patch clamp recordings to determine the extent of intraclaustrum connectivity between excitatory neurons. Following these experiments, in vivo axon imaging was performed to determine how claustrum-retrosplenial cortex neurons are modulated by different combinations of auditory, visual, and somatosensory input. Finally, the authors perform claustrum lesions to determine if claustrum neurons are required for performance on a multisensory discrimination task

      Strengths:

      An important potential contribution the authors provide is the demonstration of intra-claustrum excitation. In addition, this paper provides the first experimental data where two cortical inputs are independently stimulated in the same experiment (using 2 different opsins). Overall, the in vitro patch clamp experiments and anatomical data provide confirmation that claustrum neurons receive convergent inputs from areas of the frontal cortex. These experiments were conducted with rigor and are of high quality.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      The title of the paper states that claustrum neurons integrate information from different cortical sources. However, the authors did not actually test or measure integration in the manuscript. They do show physiological convergence of inputs on claustrum neurons in the slice work. Testing integration through simultaneous activation of inputs was not performed. The convergence of cortical input has been recently shown by several other papers (Chia et al), and the current paper largely supports these previous conclusions. The in vivo work did test for integration because simultaneous sensory stimulations were performed. However, integration was not measured at the single cell (axon) level because it was unclear how activity in a single claustrum ROI changes in response to (for example) visual, tactile, and visual-tactile stimulations. Reading the discussion, I also see the authors speculate that the sensory responses in the claustrum could arise from attentional or salience-related inputs from an upstream source such as the PFC. In this case, claustrum cells would not integrate anything (but instead respond to PFC inputs).

      We thank the reviewer for raising this point. In response, we have provided a definition of “integration” in the manuscript text (lines 112-114, 353-354):

      “...single-cell responsiveness to more than one input pathway, e.g. being capable of combining and therefore integrating these inputs.”

      The reviewer’s point about testing simultaneous input to the claustrum is well made but not possible with the dual-color optogenetic stimulation paradigm used in our study as noted in the Results and Discussion sections (see also Klapoetke et al., 2014, Hooks et al., 2015). The novelty of our paper comes from testing these connections in single CLA neurons, something not shown in other studies to-date (Chia et al., 2020; Qadir et al., 2022), which average connectivity over many neurons.

      Finally, we disagree with the reviewer regarding whether integration was tested at the single-axon level and provide data and supplementary figures to this effect (Fig. 6, Supp. Fig. S14, lines 468-511) . Although the possibility remains that sensory-related information may arise in the prefrontal cortex, as we note, there is still a large collection of studies (including this one) that document and describe direct sensory inputs to the claustrum (Olson & Greybeil, 1980; Sherk & LeVay, 1981; Smith & Alloway, 2010; Goll et al., 2015; Atlan et al., 2017; etc.). We have updated the wording of these sections to note that both direct and indirect sensory input integration is possible.

      The different experiments in different figures often do not inform each other. For example, the authors show in Figure 3 that claustrum-RSP cells (CTB cells) do not receive input from the auditory cortex. But then, in Figure 6 auditory stimuli are used. Not surprisingly, claustrum ROIs respond very little to auditory stimuli (the weakest of all sensory modalities). Then, in Figure 7 the authors use auditory stimuli in the multisensory task. It seems that these experiments were done independently and were not used to inform each other.

      The intention behind the current manuscript was to provide a deep characterisation of claustrum to inform future research into this enigmatic structure. In this case, we sought to test pathways in vivo that were identified as being weak or absent in vitro to confirm and specifically rule out their influence on computations performed by claustrum. We agree with the reviewer’s assessment that it is not surprising that claustrum ROIs respond weakly to auditory stimuli. Not testing these connections in vivo because of their apparent sparsity in vitro would have represented a critical gap in our knowledge of claustrum responses during passive sensory stimulation.

      One novel aspect of the manuscript is the focus on intraclaustrum connectivity between excitatory cells (Figure 2). The authors used wide-field optogenetics to investigate connectivity. However, the use of paired patch-clamp recordings remains the ground truth technique for determining the rate of connectivity between cell types, and paired recordings were not performed here. It is difficult to understand and gain appreciation for intraclaustrum connectivity when only wide-field optogenetics is used.

      We thank the reviewer for acknowledging the novelty of these experiments. We further acknowledge that paired patch-clamp recordings are the gold standard for assessing synaptic connectivity. Typically such experiments are performed in vitro, a necessity given the ventral location of claustrum precluding in vivo patching. In vitro slice preparations by their very nature sever connections and lead to an underestimate of connectivity as noted in our Discussion. Kim et al. (2016) have done this experiment in coronal slices with the understanding that excitatory-excitatory connectivity would be local (<200 μm) and therefore preserved. We used a variety of approaches that enabled us to explore connectivity along the longitudinal axis of the brain (the rostro-caudal, e.g. “long” axis of the claustrum), providing fresh insight into the circuitry embedded within this structure that would be challenging to examine using dual recordings. Further, our optogenetic method (CRACM, Petreanu et al., 2007), has been used successfully across a variety of brain structures to examine excitatory connectivity while circumventing artifacts arising from the slice axis.

      In Figure 2, CLA-rsp cells express Chrimson, and the authors removed cells from the analysis with short latency responses (which reflect opsin expression). But wouldn't this also remove cells that express opsin and receive monosynaptic inputs from other opsin-expressing cells, therefore underestimating the connectivity between these CLA-rsp neurons? I think this needs to be addressed.

      The total number of opsin-expressing CLA neurons in our dataset is 4/46 tested neurons. Assuming all of these neurons project to RSP, they would have accounted for 4/32 CLARSP neurons. Given the rate of monosynaptic connectivity observed in this study, these neurons would only contribute 2-3 additional connected neurons. Therefore, the exclusion of these neurons does not significantly impact the overall statistical accuracy of our connectivity findings.

      In Figure 5J the lack of difference in the EPSC-IPSC timing in the RSP is likely due to 1 outlier EPSC at 30 ms which is most likely reflecting polysynaptic communication. Therefore, I do not feel the argument being made here with differences in physiology is particularly striking.

      We thank the reviewer for their attention to detail about this analysis. We have performed additional statistics and found that leaving this neuron out does not affect the significance of the results (new p-value = 0.158, original p-value = 0.314, Mann-Whitney U test). We have removed this datapoint from the figure and our analysis.

      In the text describing Figure 5, the authors state "These experiments point to a complex interaction ....likely influenced by cell type of CLA projection and intraclaustral modules in which they participate". How does this slice experiment stimulating axons from one input relate to different CLA cell types or intra-claustrum circuits? I don't follow this argument.

      We have removed this speculation from the Results section.

      In Figure 6G and H, the blank condition yields a result similar to many of the sensory stimulus conditions. This blank condition (when no stimulus was presented) serves as a nice reference to compare the rest of the conditions. However, the remainder of the stimulation conditions were not adjusted relative to what would be expected by chance. For example, the response of each cell could be compared to a distribution of shuffled data, where time-series data are shuffled in time by randomly assigned intervals and a surrogate distribution of responses generated. This procedure is repeated 200-1000x to generate a distribution of shuffled responses. Then the original stimulus-triggered response (1s post) could be compared to shuffled data. Currently, the authors just compare pre/post-mean data using a Mann-Whitney test from the mean overall response, which could be biased by a small number of trials. Therefore, I think a more conservative and statistically rigorous approach is warranted here, before making the claim of a 20% response probability or 50% overall response rate.

      We appreciate the reviewer's thorough analysis and suggestion for a more conservative statistical approach. We acknowledge that responses on blank trials occur about 10% of the time, indicating that response probabilities around this level may not represent "real" responses. To address this, we will include the responses to the blank condition in the manuscript (lines 505-509). This will allow readers to make informed decisions based on the presented data.

      Regarding Figure 6, a more conventional way to show sensory responses is to display a heatmap of the z-scored responses across all ROIs, sorted by their post-stimulus response. This enables the reader to better visualize and understand the claims being made here, rather than relying on the overall mean which could be influenced by a few highly responsive ROIs.

      We apologize to the reviewer that our data in this figure was challenging to interpret. We have included an additional supplemental figure (Supp. Fig. S15) that displays the requested information.

      For Figure 6, it would also help to display some raw data showing responses at the single ROI level and the population level. If these sensory stimulations are modulating claustrum neurons, then this will be observable on the mean population vector (averaged df/f across all ROIs as a function of time) within a given experiment and would add support to the conclusions being made.

      We appreciate the reviewer’s desire to see more raw data – we would have included this in the figure given more space. However, the average df/f across all ROIs is shown as a time series with 95% confidence intervals in Fig. 6D.

      As noted by the authors, there is substantial evidence in the literature showing that motor activity arises in mice during these types of sensory stimulation experiments. It is foreseeable that at least some of the responses measured here arise from motor activity. It would be important to identify to what extent this is the case.

      While we acknowledge that some responses may arise from motor-related activity, addressing this comprehensively is beyond the scope of this paper. Given the extensive number of trials and recorded axonal segments, we believe that motor-related activity is unlikely to significantly impact the average response across all trials. Future studies focusing specifically on motor activity during sensory stimulation experiments would be needed to elucidate this aspect in detail.

      All claims in the results for Figure 6 such as "the proportion of responsive axons tended to be highest when stimuli were combined" should be supported by statistics.

      We have provided additional statistics in this section (lines 490-511) to address the reviewer’s comment.

      In Figure 7, the authors state that mice learned the structure of the task. How is this the case, when the number of misses is 5-6x greater than the number of hits on audiovisual trials (S Figure 19). I don't get the impression that mice perform this task correctly. As shown in Figure 7I, the hit rate is exceptionally low on the audiovisual port in controls. I just can't see how control and lesion mice can have the same hit rate and false alarm rate yet have different d'. Indeed, I might be missing something in the analysis. However, given that both groups of mice are not performing the task as designed, I fail to see how the authors' claim regarding multisensory integration by the claustrum is supported. Even if there is some difference in the d' measure, what does that matter when the hits are the least likely trial outcome here for both groups.

      We thank the reviewer for their comments and hope the following addresses their confusion about the performance of animals during our multimodal conditioning task.

      Firstly, as pointed out by the reviewer, the hit-rate (HR) is lower than false-alarm-rate (FR) but crucially only when assessed explicitly within-condition (e.g. just auditory or just visual stimulation). Given the multimodal nature of the assay, HR and FR could also be evaluated across different trials, unimodal and multimodal, for both auditory and visual stimuli. Doing so resulted in a net positive d', as observed by the reviewer. From this perspective, and as documented in the Methods (Multimodal Conditioning and Reversal Learning) and Supplemental Figures, mice do indeed learn the conditioning task and perform at above-chance levels.

      Secondly, as raised in the Discussion, an important caveat of this assay was that it was unnecessary for mice to learn the task structure explicitly but, rather, that they respond to environmental cues in a reward-seeking manner that indicated perception of a stimulus. "Performance" as it is quantified here demonstrates a perceptual difference between conditions that is observed through behavioral choice and timing, not necessarily the degree to which the mice have an understanding of the task per se.

      In the discussion, it is stated that "While axons responded inconsistently to individual stimulus presentations, their responsivity remained consistent between stimuli and through time on average...". I do not understand this part of the sentence. Does this mean axons are consistently inconsistent?

      The reviewer’s interpretation is correct – although recorded axons tended to have a preferred stimulus or combination of stimuli, they displayed variability in their responses (response probability), though little or no variability in their likelihood to respond over time (on average).

      In the discussion, the authors state their axon imaging results contrast with recent studies in mice. Why not actually do the same analysis that Ollerenshaw did, so this statement is supported by fact? As pointed out above, the criteria used to classify an axon as responsive to stimuli were very liberal in this current manuscript.

      While we appreciate this comment from the reviewer, we feel that it was not necessary to perform similar analyses to those of Ollerenshaw et al in order to appreciate that methodological differences between these studies would have confounded any comparisons made, as we note in the Discussion.

      I find the discussion wildly speculative and broad. For example, "the integrative properties of the CLA could act as a substrate for transforming the information content of its inputs (e.g. reducing trial-to-trial variability of responses to conjunctive stimuli...)". How would a claustrum neuron responding with a 10% reliability to a stimuli (or set of stimuli) provide any role in reducing trial-to-trial variability of sensory activity in the cortex?

      We thank the reviewer for their feedback. We acknowledge the reviewer's concern regarding the speculative nature of our discussion. To address the specific point raised, while a neuron with a 10% reliability might appear limited in reducing trial-to-trial variability in sensory activity, it's possible that such neurons are responsive to a combination of stimuli or conditions not fully controlled or recorded in our current setup. For instance, variables like the animal’s attentional or motivational states could influence the responsiveness of claustrum neurons, thus integrating these inputs could theoretically modulate cortical processing. We have refined this section to clarify these points (now lines 810-813).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shelton et al. explore the organization of the Claustrum. To do so, they focus on a specific claustrum population, the one projecting to the retrosplenial cortex (CLA-RSP neurons). Using an elegant technical approach, they first described electrophysiological properties of claustrum neurons, including the CLA-RSP ones. Further, they showed that CLA-RSP neurons (1) directly excite other CLA neurons, in a 'projection-specific' pattern, i.e. CLA-RSP neurons mainly excite claustrum neurons not projecting to the RSP and (2) receive excitatory inputs from multiple cortical territories (mainly frontal ones). To confirm the 'integrative' property of claustrum networks, they then imaged claustrum axons in the cortex during singleor multi-sensory stimulations. Finally, they investigated the effect of CLA-RSP lesion on performance in a sensory detection task.

      Strengths:

      Overall, this is a really good study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. The in-vitro part is impressive, and the results are compelling.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      One noteworthy concern arises from the terminology used throughout the study. The authors claimed that the claustrum is an integrative structure. Yet, integration has a specific meaning, i.e. the production of a specific response by a single neuron (or network) in response to a specific combination of several input signals. In this study, the authors showed compelling results in favor of convergence rather than integration. On a lighter note, the in-vivo data are less convincing, and do not entirely support the claim of "integration" made by the authors.

      We thank the reviewer for their clarity on this issue. We absolutely agree that without clear definition in the study, interpretation of our data could be misconstrued for one of several possible meanings. We have updated our Introduction, Results, and Discussion text to reflect the definition of ‘integration’ we used in the interpretation of our work and hope this clarifies our intent to the reader.

      Reviewer #3 (Public Review):

      The claustrum is one of the most enigmatic regions of the cerebral cortex, with a potential role in consciousness and integrating multisensory information. Despite extensive connections with almost all cortical areas, its functions and mechanisms are not well understood. In an attempt to unravel these complexities, Shelton et al. employed advanced circuit mapping technologies to examine specific neurons within the claustrum. They focused on how these neurons integrate incoming information and manage the output. Their findings suggest that claustrum neurons selectively communicate based on cortical projection targets and that their responsiveness to cortical inputs varies by cell type.

      Imaging studies demonstrated that claustrum axons respond to both single and multiple sensory stimuli. Extended inhibition of the claustrum significantly reduced animals' responsiveness to multisensory stimuli, highlighting its critical role as an integrative hub in the cortex.

      However, the study's conclusions at times rely on assumptions that may undermine their validity. For instance, the comparison between RSC-projecting and non-RSC-projecting neurons is problematic due to potential false negatives in the cell labeling process, which might not capture the entire neuron population projecting to a brain area. This issue casts doubt on the findings related to neuron interconnectivity and projections, suggesting that the results should be interpreted with caution. The study's approach to defining neuron types based on projection could benefit from a more critical evaluation or a broader methodological perspective.

      We thank the reviewer for their attention to the methods used in our study. We acknowledge that there is an inherent bias introduced by false-negatives as a result of incomplete labeling but contend that this is true of most modern tracing experiments in neuroscience, irrespective of the method used. Moreover, if false-negative biases are affecting our results, then they likely do so in the direction of supporting our findings – perfect knowledge of claustrum connectivity would likely enhance the effects seen by increasing the pool of neurons for which we find an effect. For example, our cortico-claustal connectivity findings in Figure 3 likely would have shown even larger effects should false-negative CLARSP neurons have been positively identified.

      Where appropriate we have provided estimates of variability and certainty in our experimental findings and do not claim any definitive knowledge of the true rate and scope of claustrum connectivity.

      Nevertheless, the study sets the stage for many promising future research directions. Future work could particularly focus on exploring the functional and molecular differences between E1 and E2 neurons and further assess the implications of the distinct responses of excitatory and inhibitory claustrum neurons for internal computations. Additionally, adopting a different behavioral paradigm that more directly tests the integration of sensory information for purposeful behavior could also prove valuable.

      We thank the reviewer for their outlook on the future directions of our work. These avenues for study, we believe, would be very fruitful in uncovering the cell-type-specific computations performed by claustrum neurons.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors):

      The editor recommends addressing the issues raised by the reviewers about the statistical significance of sensory response with respect to blank stimuli, and solving the issue generated by the exclusion of monosynaptically connected neurons in the connectivity study, to raise the assessment strength of evidence from incomplete to solid. Moreover, as the reported result stands, the behavioral task does not seem to be learned by the animals as the animals are above chance for visual and auditory but largely below chance level for multisensory. It seems that the animals do not perform a multisensory task. The authors should clarify this.

      Reviewer #1 (Recommendations For The Authors):

      Several references were missing from the manuscript, where mouse CLA-retrosplenial or CLA-frontal neurons were investigated and would be highly relevant to both the discussion of claustrum function and the context of the methodologies used here. (Wang et al., 2023 Nat Comm; Nair et al., 2023 PNAS, Marriott et al. 2024 Cell Reports ; Faig et al., 2024 Current

      Biology).

      Reviewer #2 (Recommendations For The Authors):

      Let me be clear, this is an excellent study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. However, the study is somehow disconnected, with a fantastic in-vitro part, and, in my opinion, a less convincing in-vivo one.

      As stated in the public review, I'm concerned about the use of the term "integration", as, in my opinion, the data presented in this study (which I repeat are of excellent level) do not support that claim.

      Below are my main points regarding the article:

      (1) My main comment relates to the use of the term 'integration'. It might be a semantic debate, but I think that this is an important one. In my opinion, neural integration is the "summing of several neural input signals by a single neuron to produce an output signal that is some function of those inputs". As the authors state in the discussion, they were not able to "assess the EPSP response magnitude to the conjunction of stimuli due to photosensitivity of ChrimsonR opsins to blue light". Therefore, the authors did not specifically prove integration, but rather input convergence. This does not mean that the results presented are not important or of excellent quality, but I encourage the authors to either tone down the part on integration or to give a clear definition of what they call integration.

      (2) The in vivo imaging data are somehow confusing. First, the authors image two claustral populations simultaneously (the CLA-RSP and the CLA-ACA axons). I may be missing the information, but there is no evidence that these cells overlap in the CLA (no data in the supplement and existing literature only support partial overlap). Second, in the results part, the authors claim that 96% of the sensory-responsive axons displayed multisensory response. This, combined with the 47% of axons responsive to at least one stimulus should lead to a global response of around 45% of the axons in multisensory trials. Yet, in Figures 6F-G, one can see that the response probability is actually low (closer to 20%). To be honest, I cannot really understand how to make sense of these results. At first, I thought that most of the multisensory responsive axons show no response during multisensory stimulus (but one in the unimodal stimulus). This hypothesis is however unlikely, as response AUC is biased toward positivity in Figure 6H. Overall, I'm not totally convinced by the imaging data, and I think that the authors should be more cautious about interpreting their results (as they are in the discussion part, but less in the results part).

      (3) The TetTox approach used in the study ablates all neurons expressing the CRE in the CLA. If the hypothesis proposed by the authors is true, then ablating one subpopulation should not impact that much the functioning of the whole CLA, as other neurons will likely "integrate" information coming from multiple cortices (Figures 3 and 4), the local divergence (Figure 1) will then allow the broadcasting of this information back to multiples cortices. Do the authors think that such an approach deeply modified intra-claustral network connectivity? If this is not the case, shouldn't we expect less effect after lesioning a specific sub-population of CLA neurons?

      (4) The behavioral protocol is also confusing. If I understand correctly, the aim of the task was to probe the D-Prime factor, as all trials, whatever the response of the animal are rewarded. From the Figure 7I, one can see that the mice cannot properly answer to the audiovisual cues, clearly indicating that both groups show impaired response to this type of trial. The whole conclusion of the authors is therefore drawn from the D-Prime calculation. However, even if D-Prime should represent a measure of sensitivity (i.e. is unaffected by response bias), two assumptions need to be met: (1) the signal and noise distributions should be both normal, and (2) the signal and noise distributions should have the same standard deviation. However, these assumptions cannot be tested in the task used by the authors (one would need rating tasks). The authors might want to use nonparametric measures of sensitivity such as A' (see Pollack and Norman 1964).

      Reviewer #3 (Recommendations For The Authors):

      While the study is comprehensive, some of its conclusions are based on assumptions that potentially weaken their validity. A significant issue arises in the comparison between neurons that project to the retrosplenial cortex (RSC) and those that do not. This differentiation is based on retrograde labeling from a single part of the RSC. However, CTB labeling, the technique used, does not capture 100% of the neurons projecting to a brain area. The study itself demonstrates this by showing that injecting the dye into three sections of the RSC results in three overlapping populations of neurons in the claustrum. Therefore, limiting the injection to just one of these areas inevitably leads to many false negatives-neurons that project to the RSC but are not marked by the CTB. This issue recurs in the analysis of neurons projecting to both the RSC and the prelimbic cortex (PL), where assumptions about interconnectivity are made without a thorough examination of overlap between these populations. The incomplete labeling complicates the interpretation of the data and draws firm conclusions from it.

      Minor.

      There is a reference to Figure 1D where claustrum->cortical connections are described. This should be 5D.

      This is a correct reference pointing back to our single-cell characterizations of CLA morphoelectric types.

      End of Page 22. Implies should be imply.

      This has been resolved in the manuscript text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting and valuable study that uses multiple approaches to understand the role of bursting involving voltage-gated calcium channels within the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Given its unique functional roles and connectivity pattern, the idea that the mediodorsal thalamus may have a fundamental role in regulating alcohol-induced transitions in consciousness state would be both important for researchers investigating thalamocortical dynamics and more broadly interesting for understanding brain function. In addition, the author's examination of the role of the voltage-gated calcium channel Cav3.1 provides some evidence that burst-firing mediated by this channel in the thalamus is functionally important for behavioral-state transitions. While many previous studies have suggested an analogous role for sleep-state regulation, the evidence for an analogous role of this type of bursting in sedative-induced transitions is more limited. Despite the importance of these results, however, there is some concern that the manipulations and recording approaches employed by the authors may affect other thalamic nuclei adjacent to the MD, such as the central lateral nucleus, which has also been implicated in controlling state transitions. The evidence for a specific role of the mediodorsal thalamus is therefore somewhat incomplete, and so additional validation is needed.

      Strengths:

      This study employs multiple, complementary research approaches including behavioral assays, sh-RNAbased localized knockdown, single-unit recordings, and patterned optogenetic interventions to examine the role of activity in the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Experiments and analyses included in the manuscript generally appear well conceived and are also generally well executed. Sample sizes are sufficiently large and statistical analysis appears generally appropriate though in some cases additional quantification would be helpful. The findings presented are novel and provide some interesting insight into the role of the thalamus as well as voltage-gated calcium channels within this region in controlling behavioral state transitions induced by alcohol. In particular, the observed effects of selective knockout along with recordings in total knockout of the voltage-gated calcium channel, Cav3.1, which has previously been implicated in bursting dynamics as well as state transitions, particularly in sleep, together suggest that the transition of thalamic neurons to a bursting pattern of firing from a more constant firing is important for transition to the sedated state produced by ethanol intoxication. While previous studies have similarly implicated Cav3.1 bursting in behavioral state transitions, the direct optogenetic interventions and single-unit recordings provide valuable new insight. These findings may also have interesting implications for the relationship between sleep process disruption associated with ethanol dependence, although the authors do not appear to examine this directly or extensively discuss these implications of their findings.

      Weaknesses:

      A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript. While sh- RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach. Similarly, while an example is shown for the expression of ChR2 (Fig. 5) there seems to be some spread of expression outside of the mediodorsal thalamus even in his example raising a concern about how regionally specific this effect.

      The recordings targeting the mediodorsal thalamus could provide evidence of a direct association between changes in activity specifically in this part of the thalamus with the behavioral measures but there are currently some issues with making this link. One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus. The lack of these direct links, in combination with the histological issues, reduces the insight that can be gained from this study.

      In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial. While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section. Similarly, the staining method used in Figure 2 does not appear to be described in the methods section. The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text. The lack of detailed descriptions makes it difficult to evaluate the applicability and quality of the experimental and analytical approaches. Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement. Similarly, the next sentence "These results support that the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them. There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect. Also, S7 has no label] on the B panels. Finally, some references are not included (only a label of [ref]).

      Reviewer #2 (Public Review):

      In the current study, Latchoumane and collaborators focus on the Cav3.1 calcium channels in the mediodorsal thalamic nucleus as critical players in the regulation of brain-states and ethanol resistance in mice. By combining behavioural, electrophysiological, and genetic techniques, they report three main findings. First, KO Cav3.1 mice exhibit resistance to ethanol-induced sedation and sustained tonic firing in thalamocortical units. Second, knocked-down Cav3.1 mice reproduce the same behaviour when the mediodorsal, but not the ventrobasal, thalamic nucleus is targeted. Third, either optogenetic or electric stimulation of the mediodorsal thalamus reduces ethanol-induced sedation in control animals.

      Overall, the study is well designed and performed, correctly controlled for confounds, and properly analysed. Nonetheless, it is important to address some aspects of the report. The results support the conclusions of the study. These results are likely to be relevant in the field of systems neuroscience, as they increase the molecular evidence showing how the thalamus regulates brain states.

      Reviewer #1 (Recommendations For The Authors):

      Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO. Localizing this inhibition to the mediodorsal thalamus would also lend further credence to their claim that this nuclei is the relevant circuit for their observed effects. For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus. In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1 – 1: A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript.

      R1-A1: The reviewer is right that CL has been pointed as another candidate structure with causal influence on arousal and consciousness. We have focused our efforts in including only recording single units that were from tetrode located in the MD specifically using the lesion code we explain in the method section and in response to R1 question#3. We also produced a quantification of Cav3.1 knock-down that clearly demonstrates that the KD experiment was itself specific to MD, bilaterally, and that CL to CM were minimally impacted by the knock-down process (Fig. 2C and D). Moreover, the optogenetic  (fiber incidence was 30 degrees guaranteeing a central coverage rather than lateral; Fiber optic NA = 0.22) and electric stimulation (bipolar twisted electrodes, 50uA) experiments were also very selective and specific to the MD (Fig.S5). It remains clear that MD might not be the sole structure involved in the brain state control towards sedation and “anesthetic states”, and CL might be a significant contributor as well, however, we show that CL manipulations were rather irrelevant in our experiments  (Fig. 2, S5, S9 and S11).

      R1-2: While sh-RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach.

      R1-A2: In order to address this important question, we have created an additional panel quantification to fig2D. We have then quantified the intensity per area of Cav3.1 expression in sub zones of 4 regions of interest: MD (left, right; 2 subzones each), Centro Medial (CM; 1 subzones in total), Centrolateral/Paraventricular nucleus (CL/PCN; left, right; 2 subzones each) and the submedial nucleus (SMT; left, right; used as a control for the intensity normalization; 1 subzones in total). This panel clearly illustrates that MD was knocked-down bilaterally (p<0.001). Moreover, CM (p<0.05) and CL (p<0.01) were also partially and unilaterally knocked down, as well. This analysis confirms that our KD had a high specificity to MD.

      We added the relevant figure caption and text:

      [Result section, Cav3.1 silencing in the MD, but not VB, increased ethanol resistance in mice, paragraph 3]

      “We then characterized the change in Cav3.1 expression following the shControl and shCav3.1 knockdown injections in three test regions MD (left and right), CM (centromedial nucleus) and CL (centrolateral nuclei, left and right side) and a negative control region SMT (submedial thalamic nuclei, left and right side). The average intensity was obtained from two coronal brain slices for each mice used in the experiment (see Methods sections, Cav3.1 Intensity quantification). Our results show that the targeting of the knockdown was very specific to the bilateral MD (p<0.001; Fig. 2D). We noted that the CM (p<0.05) and a marginal unilateral knock-down of the CL were also observed (p<0.01). Notably, we tested the correlation between the level of knock-down in MD and the total time in LOM and observed a significant association (Fig. 2D inset; R = 0.599, p = 0.018). This result highlights that the Cav3.1 knock-down was specific to MD and with an intensity associated with ethanol-induced loss of motion.”

      R1-3: One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus.

      R1-A3: Related to fig.S5, we re-distributed the position of the recordings from the tetrode electrode burned positions over 3 representative coronal planes that best represent the implant positions. We also provided additional snapshots of tetrode location. To identify the positions of four tetrodes in each animal, we encoded the positions with different electrical lesion strategies as follows: 1 lesion(tetrode 1), 2 lesions while we redrew the tetrode with 100 um interval (tetrode 2), 3 lesions with 200um interval (tetrode 3), 4 lesions with 50um intervals (tetrode4). Tetrodes that were found outside of the MD delimited region were discarded post analysis. A straight relationship between the closeness of the electrode is unfortunately not possible for tetrode recording, a straight silicone probe which maintains the spatial spacing in recording would have been a better approach in that case, but unfortunately, it was not performed in our study.

      R1-4: In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial.

      R1-A4: We thank the reviewer for his comments and suggestions. We agree that the added references mentioned by the reviewers are highly relevant and should be integrated in the manuscript. We have integrated the above-mentioned references and further developed on the discussion on the role of MD relative to other thalamic nuclei (ILN and CL in particular). We believe that this better-referenced and clarified text does improve the manuscript greatly.

      [introduction section, paragraph 3]

      “The centrolateral (CL) thalamic nucleus has been implicated in the modulation of arousal, behavior arrest 31, and improvement of level of consciousness during seizures 32. Notably, the direct electrical stimulation of the intralaminar nuclei (ILN) and, in particular CL, promoted hallmarks of arousal and awakening in primate under propofol and ketamine propofol anesthesia.”

      [Discussion section, paragraph 1]

      “In this work, we identified that the neural activity in MD plays a causal role in the maintenance of consciousness. Whole body Cav3.1 KO and MD-specific Cav3.1 KD mice showed resistance to loss of consciousness induced by hypnotic dose of ethanol. In WT mice, MD neurons demonstrated a reduced firing rate in natural (sleep) and ethanol-induced unconscious states compared to awake states. This neural activity reduction was impaired in KO mice. In particular, transition to an unconscious state was accompanied with a switch of firing mode from tonic firing to burst firing in WT mice whereas this modeshift disappeared in KO mice. Finally, optogenetic or electric stimulations of the MD after ethanol injection were sufficient to induce a resistance to loss of motion, supporting that the level of neural firing in the MD is critical to maintain conscious state and delay unconscious state. We showed that the expression of Cav3.1 t-type calcium channels in MD is a cellular modulator associated with this effect.”

      [Discussion section, MD is a modulator of consciousness, paragraph 2 and 3]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.

      Supporting the brain state stabilization theory and the ethanol resistance of Cav3.1 mutants, Choi et al.34 demonstrated that the loss of Cav3.1 T-type calcium channel reduced the bilateral coherence between PFC and MD under ketamine anesthesia and ethanol hypnosis, especially in the delta frequency bands. More importantly, under propofol anesthesia, Bastos et al.35 showed that intralaminar nucleus and MD stimulation lead to increased wake-up subscore and arousal, together with an increased in cortico-cortico and thalamo-cortical slow (delta) frequency power.

      In the present study, we observed that MD KD (Fig. 2A), but not VB KD (Fig. S3) of Cav3.1 increased and is associated (Fig. 2D) with ethanol resistance in mice. We found that MD neurons in Cav3.1 mutant mice exhibited tonic firing within range of wakefulness (Fig. 3 and 4), indicative of resistance to ethanol and wake-like brain state. In addition, we found a strong association between the normalized tonic firing in MD and the arousal through brain states (i.e. walk to wake to sleep states), supporting that MD tonic firing could be interpreted both as a thalamic readout and a modulator of the brain state 11 (Fig. 3). Finally, direct optogenetic and electric MD stimulation increased resistance to loss of consciousness in WT mice (Fig.5 and Fig. S10). To our knowledge, this is the first report demonstrating the causal involvement of mediodorsal thalamic nucleus in the modulation of wakefulness and the resistance to ethanol-induced loss of consciousness in mice.”

      R1-5: While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section.

      R1-A5: We have added a clear definition in the supplementary method following the original work used:

      [Supplementary Method section, Single Unit recording, sorting and analysis, last paragraph]

      “The bursting index was derived as described in (Royer et al. 2012). Namely, the burst index was estimated from the spike auto-correlogram (1-ms bin size) by subtracting the mean value between 40 and 50 ms (baseline) from the peak measured between 0 and 10 ms. Positive burst amplitudes were normalized to the peak and negative amplitudes were normalized to the baseline to obtain indexes ranging from −1 to 1.” We also edited its mention in the text for clarity:

      [Result section, Lack of Ca3.1 in MD neurons removes thalamic burst in NREM sleep, paragraph 2]

      “[…] and a clear reduction in total bursting represented as bursting index (Fig. 3-B; ratio of spikes count <10 ms and >50 ms based on auto-cross-correlogram).”

      R1-6: Similarly, the staining method used in Figure 2 does not appear to be described in the methods section.

      R1-A6: The staining method can be found in the supplementary method of the paper. [supplementary method, Immunohistochemistry]

      R1-7: The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text.

      R1-A7: Regarding the method, the UMAP approach is described in the supplementary method document [Uniform Manifold Approximation and Projection (UMAP)]. We believe that only a succinct description was needed here considering the extent of the analysis. Regarding the inserts in the main text, we agree that the main text was lacking the description of these results and we have amended the main text to reflect a clear description of this result and what it entails. The following paragraph was added:

      [Result section, Under ethanol, MD neurons lacking Cav3.1 show no burst and a wake state-like neural activity, second to last paragraph]

      “Finally, we asked whether the firing modes and properties (tonic firing rate, burst firing rate; see supplementary methods) of single MD neurons would form distinct qualitative representation of “brain stages” using a lowered dimensional UMAP representation (Uniform Manifold Approximation and Projection42 ). We observed that for awake and active (i.e. walk), the brain state representation formed two adjacent clusters that confounded both wild and mutant neurons (Fig. 4E, left panel). The REM and NREM states, the wild type neurons formed 2 additional interconnected clusters, whereas the mutant neurons tend to overlap with the clusters attributed to the “awake” brain state (Fig. 4E, second to left panel). Ethanol induced fLOM, similarly to REM and NREM clusters, was distinct from awake clusters in wild type mice and overlapped with the NREM clusters (Fig. 4E, third to left panel). Here also, mutant MD neurons showed overlap with the awake clusters rather than the “low consciousness” brain states. These results indicate that the firing mode and properties could define a brain state representation that shows distinctions in levels of consciousness. Moreover, the mutant showed a representation of “low consciousness” states overlapping with wild type “awake” states consistent with the hypothesis of resistance to loss of consciousness.”

      R1-8: Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      R1-A8: We have added two references related to the observation of the two subpopulations of spiking neurons [Schiff and Reyes, 2012; Destexhe, 2008].

      R1-9: Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement.

      R1-A9: We addressed this issue by editing and revising the manuscript for clarity and flow.

      R1-10: Similarly, the next sentence "These results support the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them.

      R1-A10: We thank the reviewer for highlighting this point. We have edited the overall text to improve clarity and flow.

      [abstract] 

      These results suggest that maintaining MD neural firing at a wakeful level is sufficient to induce resistance to ethanol-induced hypnosis in WT mice.

      R1-11: There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect.

      R1-A11: We noted this issue and have rectified the figure for clarity.

      R1-12: Also, S7 has no label on the B panels.

      R1-A12: We thank the reviewer for pointing out this lack. We have added the y-label on the panel for clarity.

      R1-13: Finally, some references are not included (only a label of [ref]).

      R1-A13: We have completed the missing reference and thank the reviewer for pointing that out.

      Additional comments

      R1-14: Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO.

      R1-A14: Here the reviewer proposes an interesting experiment which we have attempted to perform, however, poses several technical challenges. First, the KO do not have burst firing as they are depleted from Cav3.1 low-threshold calcium channel. Therefore, under ethanol, even if there might exist a rhythmic inhibition that activates Cav3.1 channels and causes a rebound burst, the KO are unable to have it. Therefore, an optogenetic inhibition would only accentuate the total inhibition and could potentially induce an overall decrease in MD firing, resulting in an increase in LOM features. Alternatively, we showed that in a WT with low ethanol dose (where LOM induction is harder), the increased rhythmic inhibition does indeed increase significantly LOM duration and marginally decreases latency to LOM (Fig. S12), indicating that increased inhibition could indeed explain the hypothesis: “ the stronger the decrease in MD firing, the faster and longer the LOM.” The only caveat of using WT here is that optogenetic inhibition might also include rebound burst post-inhibition. Injecting bursts only did not alter the response to ethanol (Fig. S10). These results point to the role of loss of firing in MD as a main factor for LOM, and potentially the contribution of burst necessitating a concurrent inhibition/loss of firing.

      We agree that inhibition in KO would further validate this hypothesis, controlling for the role of burst. We regret that we are not in the capacity to perform additional experiments involving the KO mice.

      R1-15: For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus.

      R1-A15: We agree with the reviewer that we could have added an additional region control to the gain/loss of function experiments. We would even go further as to suggest that a better control nucleus would be a high order nucleus such as PO or an unrelated sensory relay nucleus such as LGN. VB being a motor relay nucleus, could also mediate movement initiation, which could be hard to interpret. Since the complete control study for all thalamic nuclei Cav3.1 KD is outside the scope of this study, we opted not to redo these experiments and keep the focus of the manuscript on the manipulation of MD activity rather than the various available thalamic nuclei. We also do not claim that MD is the sole center able to initiate a switch in the loss of consciousness, and a more in-depth study on that matter would be clearly needed.

      R1-16: In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1-A16: We have uploaded data and code for most figures at the following repository and provided a clearer statement regarding data sharing. We thank the reviewer for pointing out this missing element.

      The link for the repository is the following:

      It contains:

      - Excel spreadsheet file of all behavior values, including the newly quantified Cv3.1 expression in MD/CL/SMT

      - Excel spreadsheet follow-up of all MD cells (single unit; tetrode) analyzed

      - Folders for all groups studied with representative figures showing EEG power over time and normalized activity (WT vs KO for 2, 3 and 4 g/kg; MDshKD vs shCTR, VBshKD vs shCTR; CHR2 NOSTIM vs STIM; ESTIM Groups and ARCH NOSTIM vs STIM)

      - A1G LORRvsLOM and OPEN FIELD Matlab data

      - Matlab and ImageJ Codes: single unit analysis, characterization, brain state characterization, sleep stages, LOM, open field analysis and statistical analysis.

      We have added the data sharing subsection in the acknowledgements:

      “Part of the analyzed data and codes are available on the open access platform, mendeley:

      Latchoumane, Charles-francois (2024), “Mediodorsal thalamic nucleus mediates resistance to ethanol through Cav3.1 T-type Ca2+ regulation of neural activity”, Mendeley Data, V1, doi: 10.17632/7fr427426m.1

      Additional data (large size recording and images) can be provided upon reasonable requests.”

      Reviewer #2 (Recommendations For The Authors):

      R2-1. Consciousness is a contentious subject. Even in humans, there is still intense research on the topic, not to mention animals, about which we still know very little. Moreover, consciousness is not quantified in this study, as there is no standard metric to do so. Accordingly, talking about 'modulation', 'transition', ́level ', or 'reduction' of consciousness can be misleading. Hence, it is probably safer to strictly refer to brain-states and/or stages of the sleep-wake cycle in this study and reframe it entirely around these concepts.

      R2-A1. The reviewer points to an important point and we appreciate this highlight. Agreeing that the definition of consciousness is rather loose and arguably difficult to pinpoint. Here, we settle on a definition that relies on the loss of motion and loss of righting reflex. This definition is widely accepted as the “verified” state in which the absence of responsiveness (to continuous stimuli, inducing reflex or discomfort) is observed and uninterrupted by jerks and spurious movements. Additional metrics needed would be the recording of EMG to quantify atonia and EEG to the settling of a dominantly slow-wave frequency (~4 Hz; ethanol-induced sedation at theta rhythm), as shown in Fig S1A. The driver of this 4Hz frequency and its correlation has been investigated previously (e.g. Choi et al, PNAS, 2012), leading to the accepted link between LOM/LORR and loss of consciousness. Our data present the advantage of showing single neuron recordings and that LOM is a state where the lowest firing activity is present (Fig S7AB) and comparable to deep sleep state activity (Fig3D). The first LOM is the most important as it highlights the deepest loss of consciousness before the ethanol starts to be metabolized and cleared, which would be consistent between animals.

      As a result, we have edited the manuscript to clarify all mentions related to brain states and states of unconsciousness.

      R2-2. It is not clear why the authors focus on the mediodorsal nucleus. This should be better explained in the introduction and developed in the discussion.

      R2-A2. This comment converges with the Reviewer 1 comments and we are addressing this lack in the discussion as suggested. We have addressed it with this previous comment and believe it is now clearer.

      R2-3. The discussion mentions that 'increased activity in MD might modulate the stability of cortical UP state and synchronization' (pg 21). This point should be either further developed and put into context, or removed. In its current state, it does not seem to contribute much to the discussion of results.

      R2-A3. We understand that the working “UP state” might not be clear enough. We have modified this sentences as follows to clarify that UP state could be either a state of where the animal is awake, aroused or attentive:

      [Discussion section, MD is a modulator of consciousness, first paragraph]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.“

      R2-4. The discussion states that 'mutant mice did not exhibit a decreased arousal level (i.e. increased locomotor activity)' (pg 23). This is confusing as decreased arousal should be reflected in decreased locomotor activity.

      R2-A4. We understand that the formulation of this sentence may be confusing and we have edited this portion of the text to improve quality in the revised version of the manuscript. To clarify, mutant mice do not exhibit reduced or increased arousal (not quantified, just observational), they do have a phenotypic hyperlocomotion. This comes in contrast with a lower basal firing rate in the MD, which in our interpretation, is not synonymous with lower arousal. We believe that the relative change in MD determines the change in arousal, and that the absolute firing is not indicative of arousal in itself, only in comparison.

      [Discussion section, The lower variability in MD Firing reflects Ethanol Resistance in Cav3.1 mutant mice, paragraph 2]

      “Mutant RS neurons in MD showed an overall lower excitability and variability of firing in various natural conscious and unconscious states compared to wild type mice. Remarkably, Cav3.1 mutant mice exhibited a clear increased locomotor activity and an increased resistance to ethanol. The general lower firing rate and the high “arousal” observed in mutant mice suggests that the relative change from state to state in tonic firing in MD, and not the absolute value of firing, might be a better correlate of change in brain state in the mice.”

      R2-5. The methods (pg 27) state that two genetic backgrounds (129/svjae and C57BL/6J ) were used in the study. Authors should show whether there were significant differences between those backgrounds in the key parameters assessed in the study (particularly resistance to ethanol sedation).

      R2-A5. As mentioned in the method section, we only used the F1-background mice, which are the firstgeneration offspring produced by crossing 129/svjae and C57BL/6J strains. To produce F1 KO mice, we kept the heterozygote mice in two strains. We unfortunately did not study the particular difference of the respective KO of these two backgrounds; however, the pure C57BL/6J KO has been used in other studies by our group (Kim et al 2001; Na et al, 2008; Park et al., 2010). The F1 background allows us to work with mice that are less aggressive and can be handled with less inherent stress.

      R2-6. It would be convenient to produce a supplementary figure associated with Figure 1C to show the same data with averages per mouse. That is, 9 points for control and 9 points for KO mice. This also applies to all cases where data is not presented per mouse but pooled between animals.

      R2-A6. We have added a panel C in Figure S1, to show the scatter values for all the mice corresponding to the figure 1C. We have also generalized this presentation for all behavior graphics showing all the animals in the scatter plot next to the boxplot. We believe that this presentation increases further the transparency of the manuscript. We have then added the scatter plot for all mice in figure Fig1, Fig2, Fig5, Fig.S2, Fig.S3, Fig.S10 and Fig.S12.

      R2-7. It would be informative to make a supplementary figure associated with Figure 1D to compare baseline raw activity levels (i.e., baseline walking recording) between control and KO mice. That is, do KO and control mice cover comparable distances and at similar speeds during baseline conditions? Figure 1D and Figure 4A suggest that the variability of locomotor activity is larger in KO mice. Hence, this parameter should be quantified and reported.

      R2-A7. We thank the reviewer for this comment. We strived to answer to this question in the manuscript in two ways:

      - We first measure the overall hyperlocomotion of the mice using the open field total distance parkoured in our mice cohorts (FigS4C). We did observe that the KO mutant showed hyperlocomotion, but not MD or VB knock-down mice. Which indicates that the hyperlocomotion component is not specific to the two thalamic nuclei studied.

      - Using the forced walking task, we impose on the animal to keep a steady pace of roughly 6cm/s. This assay allows to normalize the general walking behavior to a relatively fixed pace making it comparable for all animals.

      The reviewer suggested reporting the mean and variance in walking of WT and KO during baseline (prior to the ethanol I.P. injection). We believe that the two points mentioned above are sufficient to describe in a more quantitative way the WT vs KO locomotion differences. Moreover, by construction the normalized locomotion on the forced walking task will return similar means for the baseline, the standard deviation would, however, potentially show differences but would remain inconclusive.

      R2-8. The legend in Figure 1 states that 'the loss of consciousness is evaluated using normalized moving index using either video analysis (differential pixel motion), on- head accelerometer-based motion, or neck electromyograms'. Authors should clarify whether these methods are equivalent and support it with data.

      R2-A8. We understand the reviewer point and we have made a few modifications to the method description aligning better with what was done. For most mice, video analysis was used to obtain the moving index. When video recording was not available (2 mice), we had an accelerometer attached to the animal’s head stage which helped us derive a moving index that was similar to the video moving index. The neck electromyogram was rather used for animals implanted with the tetrodes to identify sleep stages based on local field potential frequency and muscle tone.  We have then clarified the method for this matter and Figure 1 to avoid this confusion. Since no concurrent recording of both video and accelerometer was performed, we do not have the data to compute the correlation between the two measures, however, no noticeable deviation from loss of motion was observed between the two methods. We realize that this may be a weak argument, however, our observations showed that video and accelerometers returned very similar timings for loss of motion (only a few comparative instances insufficient to present a statistical comparison).

      R2-9. How were spike bursts defined? The authors should try different criteria and verify the consistency of results.

      R2-A9 For in vivo single unit recording, we opted for a definition that is validated from our works and others as a silencing of at least 100 ms followed by a minimum of 3 spikes with:

      - First spike pairs interspike interval less than 4 ms

      - Remaining spike pairs interspike interval less than 20 ms

      We have performed this analysis using a minimum of 2 spikes, and varied silencing periods between 50 and 100ms, without observing significant deviation of the results. As shown in Figure S6B, with this approach we observed that the burst distribution had a majority with <10 spikes per burst. Figure S6C indicated that with a clear distribution of ISI for first spike within 2-4ms as observed in previous works (Desai and Varela, 2021; Alitto et al, 2019), importantly, not clearly capped at 4 ms, showing that the range for the first ISI might indeed be lower than 4ms for thalamic burst. Within burst spike waveforms can become very variable and the choice of 3 over 2 spikes minimum per burst stems from the aim to reduce false positive detection of ultra-short bursts, which in single unit recording remains controversial (Gray et al. 1995).

      Minor:

      R2-10: Figure 4A2 'Cav3.1(+/+)' should presumably be Cav3.1(-/-).

      R2-A10: this is correct and we have corrected the figure label [This sentence is ambiguous. What is ‘this’ that is correct?]

      R2-11: Figure S2C legend states 'Post-hoc group comparison was performed using.' The sentence seems to be incomplete.

      R2-A11: We have completed the sentence for clarity.

      R2-12: In the methods (pg 29) virus concentration is reported as '107 TU/ul', which probably refers to 10e7.

      R2-A12: We have corrected it by superscripting the power 7.

      R2-13: Verify Fig 1C1 and correct Y-axis overlap between title and units.

      R2-A13: We edited the figure for clarity, thank you.

      R2-14: On page 24 there is a '[ref]' that probably stands for (a missing) reference.

      R2-A14: the missing reference has been added.

    1. Author response:

      We are glad that the reviewers found our work to be interesting and appreciate its contribution to enhancing ecological validity of attention research. We also agree that much more work is needed to solidify this approach, and that some of the results should be considered “exploratory” at this point, but appreciate the recognition of the novelty and scientific potential of the approach introduced here.

      We will address the reviewers’ specific comments in a revised version of the paper, and highlight the main points here:

      · We agree that the use of multiple different neurophysiological measures is both an advantage and a disadvantage, and that the abundance of results can make it difficult to tell a “simple” story. In our revision, we will make an effort to clarify what (in our opinion) are the most important results and provide readers with a more cohesive narrative.

      · Important additional discussion points raised by the reviewers, which will be discussed in a revised version are a) the similarities and differences between virtual and real classrooms; b) the utility of the methods and data to the community and c) the implication of these results for educational neuroscience and ADHD research.

      · In the revision, we will also clarify several methodological aspects of the data analysis, as per the reviewers’ requests.

      · After final publication, the data will be made available for other researchers to use.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The heatmaps (for example, Figure 3A, B) are challenging to read and interpret due to their size. Is there a way to alter the visualization to improve interpretability? Perhaps coloring the heatmap by general anatomical region could help? We feel that these heatmaps are critical to the utility of the registration strategy, and hence, clear visualization is necessary.

      We thank the reviewers for this point on aesthetic improvement, and we agree that clearer visualization of our correlation heatmaps is important. To address this point, we have incorporated the capability of grouping “child” subregions in anatomical order by their more general “parent” region into the package function, plot_correlation_heatmaps(). Parent regions will be visually represented as smaller sub-facets in the heatmaps, and we will be submitting our full revised manuscript with these visual changes.

      (2) Additional context in the Introduction on the use of immediate early genes to label ensembles of neurons that are specifically activated during the various behavioral manipulations would enable the manuscript and methodology to be better appreciated by a broad audience.

      We thank the reviewers for this suggestion and will be revising parts of our Introduction to reflect the broader use and appeal of immediate early genes (IEGs) for studying neural changes underlying behavior.

      (3) The authors mention that their segmentation strategies are optimized for the particular staining pattern exhibited by each reporter and demonstrate that the manually annotated cell counts match the automated analysis. They mention that alternative strategies are compatible, but don't show this data.

      We thank the reviewers for this comment. We also appreciate that integration with alternative strategies is a major point of interest to readers, given that others may be interested in compatibility with our analysis and software package, rather than completely revising their own pre-existing workflows.

      This specific point on segmentation refers to the import_segmentation_custom()function in the package. As there is currently not a standard cell segmentation export format adopted by the field, this function still requires some data wrangling into an import format saved as a .txt file. However, we chose not to visually demonstrate this capability in the paper for a few reasons.

      i. A figure showing the broad testing of many different segmentation algorithms, (e.g., Cellpose, Vaa3d, Trainable Weka Segmentation) would better demonstrate the efficacy of segmentation of these alternative approaches, which have already been well-documented. However, demonstrating importation compatibility is more of a demonstration of API interface, which is better shown in website documentation and tutorial notebooks.

      ii. Additionally, showing importation with one well-established segmentation approach is still a demonstration of a single use case. There would be a major burden-of-proof in establishing importation compatibility with all potential alternative platforms, their specific export formats, which may be slightly different depending on post-processing choices, and the needs of the experimenters (e.g., exporting one vs many channels, having different naming conventions, having different export formats). For example, output from Cellpose can take the form of a NumPy file (_seg.npy file), a .png, or Native ImageJ ROI archive output, and users can have chosen up to four channels. Until the field adopts a standardized file format, one flexible enough to account for all the variables of experimental interest, we currently believe it is more efficient to advise external groups on how to transform their specific data to be compatible with our generic import function.

      Internally, in collaborative efforts, we have validated the ability to import datasets generated from completely different workflows for segmentation and registration. We intend on releasing this documentation in coming updates on our package website, which we believe will be more demonstrative on how to take advantage of our analysis package, without adopting our entire workflow.

      (4) The authors provided highly detailed information for their segmentation strategy, but the same level of detail was not provided for the registration algorithms. Additional details would help users achieve optimal alignment.

      We apologize for this lack of detail. The registration strategy depends upon the WholeBrain package for registration to the Allen Mouse Common Coordinate Framework. While this strategy has been published and documented elsewhere, we will be revising our methods to better incorporate details of this approach.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) While I was able to install the SMARTR package, after trying for the better part of one hour, I could not install the "mjin1812/wholebrain" R package as instructed in OSF. I also could not find a function to load an example dataset to easily test SMARTR. So, unfortunately, I was unable to test out any of the packages for myself. Along with the currently broken "tractatus/wholebrain" package, this is a good example of why I would strongly encourage the authors to publish SMARTR on either Bioconductor or CRAN in the future. The high standards set by Bioc/CRAN will ensure that SMARTR is able to be easily installed and used across major operating systems for the long term.

      We thank reviewers for pointing out this weakness; long-term maintenance of this package is certainly a mutual goal. Loading an .RDATA file is accomplished by either double-clicking directly on the file in a directory window, or by using the load() function, (e.g., load("directory/example.RData")). We will explicitly outline these directions in the online documentation and in our full revision.

      Moreover, we will submit our package to CRAN. Currently, SMARTR is not dependent on the WholeBrain package, which remains optional for the registration portion of our workflow. Ultimately, this independence will allow us to maintain the analysis and visualization portion of the package independently, and allow for submission to a more centralized software repository such as CRAN.

      (2) The package is quite large (several thousand lines include comments and space). While impressive, this does inherently make the package more difficult to maintain - and the authors currently have not included any unit tests. The authors should add unit tests to cover a large percentage of the package to ensure code stability.

      We appreciate this feedback and will add unit testing to improve the reliability of our package in the full revision.

      (3) Why do the authors choose to perform image segmentation outside of the SMARTR package using ImageJ macros? Leading segmentation algorithms such as CellPose and StarMap have well-documented APIs that would be easy to wrap in R. They would likely be faster as well. As noted in the discussion, making SMARTR a one-stop shop for multi-ensemble analyses would be more appealing to a user.

      We appreciate this feedback. We believe parts of our response to Reviewer 1, comment 3, are relevant to this point. Interfaces for CellPose and ClusterMap (which processes in situ transcriptomic approaches like STARmap) are both in python, and currently there are ways to call python from within R (https://rstudio.github.io/reticulate/index.html). We will certainly explore incorporating these APIs from R. However, we would anticipate this capability is more similar to “translation” between programming languages, but would not currently preclude users from the issue of still needing some familiarity with the capabilities of these python packages, and thus with python syntax.

      (4) Given the small number of observations for correlation analyses (n=6 per group), Pearson correlations would be highly susceptible to outliers. The authors chose to deal with potential outliers by dropping any subject per region that was> 2 SDs from the group mean. Another way to get at this would be using Spearman correlation. How do these analyses change if you use Spearman correlation instead of Pearson? It would be a valuable addition for the author to include Spearman correlations as an option in SMARTR.

      We thank reviewers for this suggestion and will provide a supplementary analysis of our results using Spearman correlations.

      (5) I see the authors have incorporated the ability to adjust p-values in many of the analysis functions (and recommend the BH procedure) but did not use adjusted p-values for any of the analyses in the manuscript. Why is this? This is particularly relevant for the differential correlation analyses between groups (Figures 3P and 4P). Based on the un-adjusted p-values, I assume few if any data points will still be significant after adjusting. While it's logical to highlight the regional correlations that strongly change between groups, the authors should caution which correlations are "significant" without adjusting for multiple comparisons. As this package now makes this analysis easily usable for all researchers, the authors should also provide better explanations for when and why to use adjusted p-values in the online documentation for new users.

      We appreciate the feedback and will more explicitly outline that in our paper, our dataset is presented as a more demonstrative and exploratory resource for readers and, as such, we accept a high tolerance for false positives, while decreasing risk of missing possible interesting findings. As noted by Reviewer #2, it is still “logical to highlight the regional correlations that strongly change between groups.” We will further clarify in our methods that we chose to present uncorrected p-values when speaking of significance. We will also include more statistical detail on our online documentation regarding FDR correction. Ultimately, the decision to correct for multiple comparisons and FDR choice of threshold, should still be informed by standard statistical theory and user-defined tolerance for inclusion of false-positives and missing of false-negatives. This will be influenced by factors, such as the nature and purpose of the study, and quality of the dataset.  

      (6) The package was developed in R3.6.3. This is several years and one major version behind the current R version (4.4.3). Have the authors tested if this package runs on modern R versions? If not, this could be a significant hurdle for potential users.

      We thank reviewers for pointing out concerns regarding versioning. Analysis and visualization capabilities are currently supported using R version 4.1+. The recommendation for R 3.6.3 is primarily for users interested in using the full workflow, which requires installation of the WholeBrain package. We anticipate supporting of visualization and network analysis capabilities with updated packages and R versions, and maintaining a legacy version for the full workflow presented in this paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations holds great promise to characterize mixed-cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including an in-depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raise the bar beyond the current state of the art in the field of high-content phenotyping and make this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) Explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) Generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) Application to multiple classification tasks.

      I especially liked the generalization of classification from mono- to co-cultures (Figure 4C), and quantitatively following the gradual transition from NPC to Neurons (Figure 5H).

      The manuscript is well-written and easy tofollow.

      Thank you for the positive appreciation of our work and constructive comments. 

      Weaknesses:

      I am not certain how useful/important the specific application demonstrated in this study is (quality control of iPSC cultures), this could be better explained in the manuscript. 

      To clarify the importance we have added an additional explanation to the introduction (page 3) and also come back to it in the discussion (page 17).

      Text from the introduction:

      “However, genetic drift, clonal and patient heterogeneity cause variability in reprogramming and differentiation efficiency10,11. The differentiation outcome is further strongly influenced by variations in protocol12. This can significantly impact experimental outcomes, leading to inconsistent and potentially misleading results and consequently, it hinders the use of iPSC-derived cell systems in systematic drug screening or cell therapy pipelines. This is particularly true for iPSC-derived neural cultures, as their composition, purity and maturity directly affect gene expression and functional activity, which is essential for modelling neurological conditions13,14. Thus, from a preclinical perspective, there is the need for a fast and cost-effective QC approach to increase experimental reproducibility and cell type specificity15. From a clinical perspective in turn, robust QC is required for safety and regulatory compliance (e.g., for cell therapeutic solutions). This need for improved standardization and QC is underscored by large-scale collaborative efforts such as the International Stem Cell Banking Initiative16, which focusses on clinical quality attributes and provides recommendations for iPSC validation testing for use as cellular therapeutics, or the CorEuStem network, aiming to harmonize iPSC practices across core facilities in Europe.”

      Text from the discussion: 

      “Many groups highlight the difficulty of reproducible neural differentiation and attribute this to culture conditions, cultivation time and variation in developmental signalling pathways in the source iPSC material43,44. Spontaneous neural differentiation has previously been shown to require approximately 80 days before mature neurons arise that can fire action potentials and show neural circuit formation. Although these differentiation processes display a stereotypical temporal sequence34, the exact timing and duration might vary. This variation negatively affects the statistical power when testing drug interventions and thus prohibits the application of iPSC-culture derivatives in routine drug screening. Current solutions (e.g., immunocytochemistry, flow cytometry, …) are often cost-ineffective, tedious, and incompatible with longitudinal/multimodal interrogation. CP is a much more cost-effective solution and ideally suited for this purpose. Routine CP-based could add confidence to and save costs for the drug discovery pipeline. We have shown that CP can be leveraged to capture the morphological changes associated with neural differentiation.”

      Another issue that I feel should be discussed more explicitly is how far can this application go - how sensitively can the combination of cell painting and machine learning discriminate between cell types that are more subtly morphologically different from one another?

      Thank you for this interesting question. The fact that an approach based on a subregion not encompassing the whole cell (the “nucleocentric” approach) can predict cell types equally well, suggests that the cell shape as such is not the defining factor for accurate cell type profiling. And, while clearly neural progenitors, neurons or glia have vastly different cell shapes. We have shown that cells with closer phenotypes such as 1321N1 vs. SH-SY5Y or astrocytes vs. microglia can be distinguished with equal performance. However, triggered by the reviewers’ question, we have now tested additional conditions with more subtle phenotypes, including the classification of 1321N1 vs. two related retinal pigment epithelial cells with much more similar morphology (ARPE and RPE1 cells). We found that the CNN could discriminate these cells equally well and have added the results on page 8 and in Fig. 3D. To address this question from a different angle, we have also performed an experiment in which we changed cell states to assess whether discriminatory power remains high. Concretely, we exposed co-cultures of neurons and microglia to LPS to trigger microglial activation (more subtly visible as cytoskeletal changes and vacuole formation). This revealed that our approach still discriminates both cell types (neurons vs. microglia) with high accuracy, regardless of the microglial state. Furthermore, using a two-step approach, we could also distinguish LPS-treated (assumed to be activated) from unchallenged microglia (assumed to be more homeostatic), albeit with a lower accuracy. This experiment has been added as an extra results section (Cell type identification can be applied to mixed iPSC-derived neuronal cultures regardless of activation state, p12) and Fig. 7c. Finally, we have also added our take on what the possibilities could be for future applications in even more complex contexts such as tissue slice, 3D and live cell applications (page 17-18). 

      Regarding evaluations, the use of accuracy, which is a measure that can be biased by class imbalance, is not the most appropriate measurement in my opinion. The confusion matrices are a great help, but I would recommend using a measurement that is less sensitive for class imbalance for cell-type classification performance evaluations.  

      Across all CNNs trained in this manuscript, the sample size of the input classes has always been equalized, ruling out any effects of class imbalance. Nevertheless, to follow the reviewers’ recommendation, we have now used the F-score to document performance as it is insensitive to such imbalance. For clarity, we have now also mentioned the input number (ROIs/class) in every figure.

      Another issue is that the performance evaluation is calculated on a subset of the full cell population - after exclusion/filtering. Could there be a bias toward specific cell types in the exclusion criteria? How would it affect our ability to measure the cell type composition of the population?

      As explained in the M&M section, filtering was performed based on three criteria:

      (1) Nuclear size: values below a threshold of 160, objects are considered to represent debris;

      (2) DAPI intensity: values below a threshold of 500 represent segmentation errors;

      (3) IF staining intensity: gates were set onto the intensity of the fluorescent markers used with posthoc IF to only retain cells that are unequivocally positive for either marker and to avoid inclusion of double positive (or negative) cells in the ground truth training. 

      One could argue that the last criterion introduces a certain bias in that it does not consider part of the cell population. However, this is also not the purpose of our pioneering study that aims at identifying unique cell types for which ground truth is as pure and reliable as possible. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels. For example, in the neuronal differentiation experiment (Fig. 6G-H), cells are either characterized as NPC or as neurons, which leaves the transitioning (or undefined) cells in either category. Despite this simplification, the model adequately predicted the increase in neuron/NPC ratio with culture age. In future iterations, one could envision defining more refined cell (sub-)types in a population based on richer post-hoc information (e.g., through cyclic immunofluorescence or spatial single cell transcriptomics) or longitudinal follow-up of cell-state transitions using live imaging. This notion has been added to page 17 of the manuscript.

      I am not entirely convinced by the arguments regarding the superiority of the nucleocentric vs. the nuclear representations. Could it be that this improvement is due to not being sensitive/ influenced by nucleus segmentation errors?

      The reviewer has a valid point that segmentation errors may occur. However, the algorithm we have used (Stardist classifier), is very robust to nuclear segmentation errors. To verify the performance, we have now quantified segmentation errors in 20 images for 3 different densities and found a consistently low error rate (0.6 -1.6%) without correlation to the culture density. Moreover, these errors include partial imperfections (e.g., a missed protrusion or bleb) as well as over- (one nucleus detected as more) or under- (more nuclei detected as one) segmentations. The latter two will affect both the nuclear and nucleocentric predictions and should thus not affect the prediction performance. In the case of imperfect segmentations, there may be a specific impact on the nucleus-based predictions (which rely on blanking the non-nuclear part), but this alone cannot explain the significantly higher gain in accuracy for nucleocentric predictions (>5%). Therefore, we conclude that segmentation errors may contribute in part, but not exclusively, to the overall improved performance of nucleocentric input models. We have added this notion in the discussion (pages 14-15 and Suppl. Fig. 1E).

      GRADCAM shows cherry-picked examples and is not very convincing.

      To help convince the reviewer and illustrate the representativeness of selected images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherrypicking) and added these in a Suppl. Fig. 3.

      There are many missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details, see details in the section on recommendations for the authors.

      Please see further for our specific adaptations.

      Reviewer #2 (Public Review):

      This study uses an AI-based image analysis approach to classify different cell types in cultures of different densities. The authors could demonstrate the superiority of the CNN strategy used with nucleocentric cell profiling approach for a variety of cell types classification. The paper is very clear and well-written. I just have a couple of minor suggestions and clarifications needed for the reader.

      The entire prediction model is based on image analysis. Could the authors discuss the minimal spatial resolution of images required to allow a good prediction? Along the same line, it would be interesting to the reader to know which metrics related to image quality (e.g. signal to noise ratio) allow a good accuracy of the prediction.

      Thank you for the positive and relevant feedback.

      The reviewer has a good point that it is important to portray the imaging conditions that are required for accurate predictions. To investigate this further we have performed additional experiments that give a better view on the operating window in terms of resolution and SNR (manuscript page 7-8 and new figure panels Fig. 3B-C). The initial image resolution was 0.325 µm/pixel. To understand the dependency on resolution we performed training and classifications for image data sets that were progressively binned. We found that a two-fold reduction in resolution did not significantly affect the F-score, but further degradation decreased the performance. At a resolution of 6,0 µm/pixel (20-fold binning), the F-score dropped to 0.79±0.02, comparable to the performance when only the DAPI (nuclear) channel was used as input. The effect of reduced image quality was assessed in a similar manner, by iteratively adding more Gaussian noise to the image. We found that above an SNR of 10 the prediction performance remains consistent but below it starts to degrade. While this exercise provides a first impression of the current confines of our method, we do believe it is plausible that its performance can be extended to even lower-quality images for example by using image restoration algorithms. We have added this notion in the discussion (page 14).

      The authors show that nucleocentric-based cell feature extraction is superior to feeding the CNN-based model for cell type prediction. Could they discuss what is the optimal size and shape of this ROI to ensure a good prediction? What if, for example, you increase or decrease the size of the ROI by a certain number of pixels?

      To identify the optimal input, we varied the size of the square region around the nuclear centroid from 0.6 to 150 µm for the whole dataset. Within the nuclear-to-cell window (12µm- 30µm) the average Fscore is limited, but an important observation is the increasing error and differences in precision and recall with increasing nucleocentric patch sizes, which will become detrimental in cases of class imbalance. The F-score is maximal for a box of 12-18µm surrounding the nuclear centroid. In this “sweet spot”, the precision and recall are also in balance. Therefore, we have selected this region for the actual density comparison experiment. We have added our results to the manuscript (page 9 and 15).

      It would be interesting for the reader to know the number of ROI used to feed each model and know the minimal amount of data necessary to reach a high level of accuracy in the predictions.

      The figures have now been adjusted so that the number of ROIs used as input to feed the model are listed. The minimal number of ROIs required to obtain high level accuracy is tested in Figure 2C. By systematically increasing the number of input ROIs for both RF and CNN, we found that a plateau is reached at 5000 input ROIs (per class) for optimal prediction performance. This is also documented in the results section page 6.

      From Figure 1 to Figure 4 the author shows that CNN based approach is efficient in distinguishing 1321N1 vs SH-SY5Y cell lines. The last two figures are dedicated to showing 2 different applications of the techniques: identification of different stages of neuronal differentiation (Figure 5) and different cell types (neurons, microglia, and astrocytes) in Figure 6. It would be interesting, for these 2 two cases as well, to assess the superiority of the CNN-based approach compared to the more classical Random Forest classification. This would reinforce the universal value of the method proposed.

      To meet the reviewer’s request, we have now also compared CNN to RF for the classification of cells in iPSC-derived models (Figures 6 and 7). As expected, the CNN performed better in both cases. We have now added these results in Fig. 6 D and 7 C and pages 12 and 13 of the manuscript.

      Reviewer #3 (Public Review):

      Induced pluripotent stem cells, or iPSCs, are cells that scientists can push to become new, more mature cell types like neurons. iPSCs have a high potential to transform how scientists study disease by combining precision medicine gene editing with processes known as high-content imaging and drug screening. However, there are many challenges that must be overcome to realize this overall goal. The authors of this paper solve one of these challenges: predicting cell types that might result from potentially inefficient and unpredictable differentiation protocols. These predictions can then help optimize protocols.

      The authors train advanced computational algorithms to predict single-cell types directly from microscopy images. The authors also test their approach in a variety of scenarios that one may encounter in the lab, including when cells divide quickly and crowd each other in a plate. Importantly, the authors suggest that providing their algorithms with just the right amount of information beyond the cells' nuclei is the best approach to overcome issues with cell crowding.

      The work provides many well-controlled experiments to support the authors' conclusions. However, there are two primary concerns: (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions, and (2) the conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If the authors were to address these two concerns (through additional experimentation), then the work may influence how the field performs cell profiling in the future.

      Thank you very much for confirming the potential value of our work and raising these relevant items. To better support our claims we have now performed additional validations, which we detail below. 

      (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions 

      To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2. 

      (2) The conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. 

      To address this second concern, which was also raised by reviewer 2, we have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 15 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript. 

      Additionally, the impact of this work will be limited, given the authors do not provide a specific link to the public source code that they used to process and analyze their data.

      The source code is now available on the Github page of the DeVos lab, under the following URL: https://github.com/DeVosLab/Nucleocentric-Profiling

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors):

      Evaluation summary

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels, replication biases) and computational (e.g., different models, different cell regions) parameters and argue that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations is an important application and holds great promise. The simple and high-content assay democratizes use and enables adoption by other labs. The manuscript is supported by comprehensive experimental and computational validations. The manuscript is well-written and easy to follow.

      Weaknesses:

      The conclusion is that the nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If better supported by additional experiments, this may influence how the field performs cell profiling in the future. Model interpretability (GradCAM) analysis is not convincing. The lack of a public source code repository is also limiting the impact of this study. There are missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details.

      Essential revisions:

      To reach a "compelling" strength of evidence the authors are requested to either perform a comprehensive analysis of the effect of ROI size on performance, or tune down statements regarding the superior performance of their "nucleocentric" approach. Further addition of a public and reproducible source code GitHub repository will lead to an "exceptional" strength of evidence.

      To answer the main comment, we have performed an experiment in which we varied the size of the nucleocentric patch and quantified CNN performance. We have also evaluated the operational window of our method by varying the resolution and SNR and we have experimented with different background blanking methods. We have expanded our examples of GradCAM images and now also made our source code and an example data set available via GitHub.

      Reviewer #1 (Recommendations For The Authors):

      I think that an evaluation of how the excluded cells affect our ability to measure the cell type composition of the population would be helpful to better understand the limitations and practical measurement noise introduced by this approach. A similar evaluation of the excluded cells can also help to better understand the benefit of nucleocentric vs. cell representations by more convincingly demonstrating the case for the nucleocentric approach. In any case, I recommend discussing in more depth the arguments for using the nucleocentric representation and why it is superior to the nuclear representation.

      The benefits of nucleocentric representation over nuclear and whole-cell representation are discussed more in depth at pages 14-15 of the manuscript. 

      “The nucleocentric approach, which is based on more robust nuclear segmentation, minimizes such mistakes whilst still retaining input information from the structures directly surrounding the nucleus. At higher cell density, the whole-cell body segmentation becomes more error-prone, while also loosing morphological information (Suppl. Fig. 1D). The nucleocentric approach is more consistent as it relies on a more robust segmentation and does not blank the surrounding region. This way it also buffers for occasional nuclear segmentation errors (e.g., where blebs or parts of the nucleus are left undetected).”

      It is not entirely clear to me why Figure 5 moves back to "engineered" features after previous figures showed the superiority of the deep learning approach. Especially, where Figure 6 goes again to DL. Dimensionality reduction can be also applied to DL-based classifications (e.g., using the last layer).

      Following up on the reviewers’ interesting comment, we extracted the embeddings from the trained CNN and performed UMAP dimensionality reduction. The results are shown in Fig. 3D, 6F and supplementary figure 1B and added to the manuscript on pages 6, 8 and 12. 

      We concluded that unsupervised dimensionality reduction using the feature embeddings could separate cell type clusters, where the distance between the clusters reflected the morphological similarity between the cell lines. 

      I would recommend including more comprehensive GRADCAM panels in the SI to reduce the concern of cherry-picking examples. What is the interpretation of the nucleocentric area?

      A more extensive set of GradCAM images have now been included in supplementary material (Supplementary figure 3) using the same random seeds for all conditions, thus avoiding any cherry picking. We interpret the GradCAM maps on the nucleocentric crops as highlighting the structures surrounding the nucleus (reflecting ER, mitochondria, Golgi) indicating their importance in correct cell classification. This was added to the manuscript on pages 9 and 15.

      Missing/lacking details and suggestions in the figure panels and figure legend:

      - Scale bars missing in some of the images shown (e.g., Figure 2F, Figure 3D, Figure 4, Supplementary Figure 4), what are the "composite" channels (e.g., Figure 2F), missing x-label in Figure 3B. 

      These have now been added.

      - Terms that are not clear in the figure and not explained in the legend, such as FITC and cy3 energy (Figure 1C). 

      The figure has been adapted to better show the region, channel and feature. We have now added a Table (Table 5), detailing the definition of each morphological feature that is extracted. On page 27, information on feature extraction is noted.

      - Details that are missing or not sufficiently explained in the figure legends such as what each data point represents and what is Gini importance (Figure 1D) 

      We have added these explanations to the figure legends. The Gini importance or mean decrease in impurity reflects how often this feature is used in decision tree splits across all random forest trees.

      Is it the std shown in Figure 2C?

      Yes, this has now been added to the legend.  

      It is not fully clear what is single/mixed (Figure 2D)

      Clarification is added to the legend and in the manuscript on page 6.

      explain what is DIV 13-90 in the legend (Figure 5).

      DIV stands for days in vitro, here it refers to the days in culture since the start of the neural induction process. This has been added in the legend.

      and state what are img1-5 (Supplementary Figures 1B-C) Clarification has been added to the legend.

      - Supplementary Figure 1. What is the y-axis in panel C and how do the results align with the cell mask in panel B?

      The y-axis represents the intersection over union (IoU). The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. This clarification has been added to the legend.

      - Supplementary Figure 1 and Methods. Please explain when CellPose and when StarDist were applied.

      Added to supplementary figure and methods at page 24. In the case of nuclear segmentation (nucleus and nucleocentric crops), Stardist was used. For whole-cell crops, cell segmentation using Cellpose was used.

      - Supplementary Figure 4C - the color code is different between nuclear and nucleocentric - this is confusing.

      We have changed to color code to correspond in both conditions in Fig. 1A.

      - Figure 3B - better to have a normalized measure in the x-axis (number of cells per area in um^2)

      We agree and have changed this.

      Suggestions and missing/lacking details in the text:

      • Line #38: "we then applied this" because it is the first time that this term is presented.

      This has been rephrased.

      • Line #88: a few words on what were the features extracted would be helpful.

      Short description added to page 26-27 and detailed definition of all features added in table 5.

      -  Line #91: PCA analysis - the authors can highlight what (known) features were important to PC1 using the linear transformation that defined it.

      The 5 most important features of PC1 were (in order of decreasing importance): channel 1 dissimilarity, channel 1 homogeneity, nuclear perimeter, channel 4 dissimilarity and nuclear area.  

      - Line #92: Order of referencing Supplementary Figure 4 before referencing Supplementary Figure 13.

      The order of the Supplementary images was changed to follow the chronology. 

      • Line #96: Can the authors show the data supporting this claim?

      The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.

      - Line #108: what is "nuclear Cy3 energy"?

      This represents the local change of pixel intensities within the ROI in the nucleus in the 3rd channel dimension. This parameter reflects the texture within the nuclear region for the phalloidin and WGA staining. The definitions of all handcrafted features are added in table 5 of the manuscript.

      - Line #110-112: Can the authors show the data supporting this claim?

      The figure has been changed to include the results from a filtered and unfiltered dataframe (exclusion and inclusion of redundant features). Features could be filtered out if the correlation was above a threshold of 0.95. This has been added to page 6 of the manuscript and fig. 1D.  

      - Line #115-116: please state the size of the mask.

      Added to the text (page 6). We used isotropic image crops of 60µm centred on individual cell centroids.

      - Lines 120-122: more details will make this more clear (single vs. mixed).

      This has been changed on page 6 of the manuscript.

      • Line #142: "(mimics)" - is it a typo?

      Tissue mimics refers to organoids/models that are meant to replicate the physiological behaviour.

      • Line #159: the bounding box for nucleocentric analysis is 15x15um (and not 60), as stated in the Methods.

      Thank you for pointing out this mistake. We have adapted this.

      - Line #165: what is the interpretation of what was important for the nucleocentric classification?

      The colour code in GradCAM images is indicative of the attention of the CNN (the more to the red, the more attention). In fig. 4D and Suppl. Fig. 3 the structures directly surrounding the nucleus receive high attention from the CNN trained on nucleocentric crops. This has been added to the manuscript page 9 and 15.

      • Section starting in line #172: not explicitly stated what model was used (nucleocentric?).

      Added in the legend of fig. 5. For these experiments, the full cell segmentation was still used. 

      - Section starting in line #199: why use a feature-based model rather than nucleocentric? A short sentence would be helpful.

      For CNN training, nucleocentric profiling was used. In response to a legitimate question of one of the reviewers, the feature-based UMAP analysis was replaced with the feature embeddings from the CNN. 

      - Line #213: Fig. 5B does not show transitioning cells.

      Thank you for pointing this out, this was a mistake and has been changed.

      Lines #218-220: not fully clear to some readers (culture condition as a weak label), more details can be helpful.

      We changed this at page 11 of the manuscript for clarity. 

      “This gating strategy resulted in a fractional abundance of neurons vs. total (neurons + NPC) of 36,4 % in the primed condition and 80,0% in the differentiated condition (Fig. 6C). We therefore refer to the culture condition as a weak label as it does not take into account the heterogeneity within each condition (well).”

      -  Line #230: "increasing dendritic outgrowth" - what does it mean? Can you explicitly highlight this phenotype in Figure 5G?

      When the cells become more mature during differentiation, the cell body becomes smaller and the neurons form long, thin ramifications. This explanation has been added to page 12 of the manuscript.

      • Line #243: is it the nucleocentric CNN?

      Yes.

      • Lines #304-313, the authors might want to discuss other papers dealing with continuous (non-neural) differentiation state transitions (eg PMID: 38238594).  

      A discussion of the use of morphological profiling for longitudinal follow-up of continuous differentiation states has been added to the manuscript at page 18. 

      - Line #444: cellpose or stardist? How did the authors use both?

      Clarification has been added to supplementary figure 1 and methods at page 24. Stardist was used for nuclear segmentation, whereas Cellpose was used for whole-cell segmentation. 

      • Line #470-474: I would appreciate seeing the performance on the full dataset without exclusions.

      Cells have been excluded based on 3 arguments: the absence of DAPI intensity, too small nuclear size and absence of ground truth staining. The first two arguments are based on the assumption that ROIs that contain no DAPI signal or are too small are errors in cell segmentation and therefore should not be taken along in the analysis. The third filtering step was based on the ground-truth IF signal. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels which might introduce bias. However, the model could predict increase in neuron/NPC ratio with culture age in absence of ground-truth staining (and thus IF-based filtering).

      Reviewer #2 (Recommendations For The Authors):

      Figure 1A: it would be interesting to the reader to see the SH-SY5Y data as well.

      This has been added in fig. 1A.

      Figure 3A: 95-100% image: showing images with the same magnification as the others would help to appreciate the cell density.

      Now fig. 4A. The figure has been changed to make sure all images have the same magnification. 

      Figure Supp 4 (line 132) is referred to before Figure Supp1 (line 152).

      The image order and numbering has been changed to solve this issue.

      Figure Supp 2 & 3 are not referred to in the text.

      This has been adjusted.

      Line 225: a statistical test would help to convince of the accuracy of these results (Figure 5C vs Figure 5F)?

      These figures represent the total ROI counts and thus represent a single number.

      Line 227: Could you explain to the reader, in a few words, what a dual SMAD inhibition is?

      This has been added to the manuscript at page 20. 

      “This dual blockade of SMAD signalling in iPSCs is induces neural differentiation by synergistically causing the loss of pluripotency and push towards neuroectodermal lineage.”

      Reviewer #3 (Recommendations For The Authors):

      I have a few concerns and several comments that, if addressed, may strengthen conclusions, and increase clarity of an already technically sound paper.

      Concerns

      • The results presented in Figure 3 panel D, may indicate a critical error in data processing and interpretation that the authors must address. The GradCAM method highlights the background as having the highest importance. While it can be argued in the nucleocentric profiling method that GradCAM focuses on the nuclear membrane, the background is highly important even for the nuclear profiling method, which should provide little information. What procedure did the authors use for mask subtraction prior to CNN training? Could the segmentation algorithm be performing differently between cell lines? The authors interpret the GradCAM results to indicate a proxy for nuclear size, but then why did the CNN perform so much better than random forest using hand-crafted features that include this variable? The authors should also present size distributions between cell lines (and across seeding densities, in case one of the cell lines has different compaction properties with increasing density).

      Perhaps clarifying this sentence (lines 166-168) would help as well: "As nuclear area dropped with culture density, the dynamic range decreased, which could explain the increased error rate of the CNN for high densities unrelated to segmentation errors (Suppl. Fig. 4B)." What do the authors mean by "dynamic range" and it is not clear how Supplementary Figure 4B provides evidence for this? 

      The dynamic range refers to the difference between the minimum and maximum nuclear area. We expect the difference to decrease at highe rdensity owing to the crowding that forces all nuclei to take on a more similar (smaller) size.

      More clarification on this has been added to page 9 of the manuscript.

      I certainly understand that extrapolating the GradCAM concern to the remaining single-cell images using only four (out of tens of thousands of options) is also dangerous, but so is "cherry-picking" these cells to visualize. Finally, I also recommend that the authors quantitatively diagnose the extent of the background influence according to GradCAM by systematically measuring background influence in all cells and displaying the results per cell line per density.

      To avoid cherry picking of GradCAM images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherry-picking) and added these in a Suppl. Fig. 3.

      In answer to this concern, we refer to the response above: 

      “To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2.”

      • The data supporting the conclusion about nucleocentric profiling outperforming nuclear and full-cell profiling is minimal. I am picking on this conclusion in particular, because I think it is a super cool and elegant result that may change how folks approach issues stemming from cell density disproportionately impacting profiling. Figures 3B and 3C show nucleocentric slightly outperforming full cell, and the result is not significant. The authors state in lines 168-170: "Thus, we conclude that using the nucleocentric region as input for the CNN is a valuable strategy for accurate cell phenotype identification in dense cultures." This is somewhat of a weak conclusion, that, with additional analysis, could be strengthened and add high value to the community. Additionally, the authors describe the nucleocentric approach insufficiently. In the methods, the authors state (lines 501-503): "Cell crops (60μm whole cell - 15μm nucleocentric/nuclear area) were defined based on the segmentation mask for each ROI." This is not sufficient to reproduce the method. What software did the authors use?

      Presumably, 60μm refers to a box size around cytoplasm? Much more detail is needed. Additionally, I suggest an analysis to confirm the impact of nucleocentric profiling, which would strengthen the authors' conclusions. I recommend systematically varying the subtraction (-30μm, -20μm, -10μm, 5μm, 0, +5μm, +10μm, etc.) and reporting the density-based analysis in Figure 3B per subtraction. I would expect to see some nucleocentric "sweet spot" where performance spikes, especially in high culture density. If we don't see this difference, then the non-significant result presented in Figures 3B and C is likely due to random chance. The authors mention "iterative data erosion" in the abstract, which might refer to what I am recommending, but do not describe this later.

      More detail was added to the methods describing the image crops given as input to the CNN (page 28 of the manuscript). 

      “Crops were defined based on the segmentation mask for each ROI. The bounding box was cropped out of the original image with a fixed patch size (60µm for whole cells, 18µm for nucleus and nucleocentric crops) surrounding the centroid of the segmentation mask. For the whole cell and nuclear crops, all pixels outside of the segmentation mask were set to zero. This was not the case for the nucleocentric crops. Each ROI was cropped out of the original morphological image and associated with metadata corresponding to its ground truth label.”

      To address this concern, we also refer to the answer above. 

      “We have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 12 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript.“

      Comments

      • There is a disconnect between the abstract and the introduction. The abstract highlights the nucleocentric model, but then it is not discussed in the introduction, which focuses on quality control. The introduction would benefit from some additional description of the single-cell or whole-image approach to profiling.

      We highlight the importance of QC of complex iPSC-derived neural cultures as an application of morphological profiling. We used single-cell profiling to facilitate cell identification in these mixed cultures where the whole-image approach would be unable to deal with the heterogeneity withing the field of view. In the introduction, we added a description of the whole-image vs. single-cell approach to profiling (page 4). In the discussion (page 18), we further highlight the application of this single-cell profiling approach for QC purposes. 

      - Comments on Figure 1. It is unclear how panel B shows "without replicate bias". 

      In response to this comment, we refer to the answer above: “The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.” We added this notion to page 5 of the manuscript.

      The paper would benefit from a description of how features were extracted sooner.

      Information on the feature extraction was added to the manuscript at page 27. An additional table (table 5) has been added with the definition of each feature.  

      - Comments on Supplementary Figure 4. The clustering with PCA is only showing 2 dimensions, so it is not surprising UMAP shows more distinct clustering.

      We used two components for UMAP dimensionality reduction, so the data was also visualized in two dimensions. However, we agree that UMAP can show more distinct clustering as this method is non-linear.

      Why is Figure S4 the first referenced Supplementary Figure?

      This has been changed. 

      • Comments on Figure 2. Need discussion of the validation set - how was it determined? Panel E might have the answer I am looking for, but it is difficult to decipher exactly what is being done. The terminology needs to be defined somewhere, or maybe it is inconsistent. It is tough to tell. For example, what exactly are the two categories of model validation (cross-validation and independent testing)?

      Additional clarification has been added to the manuscript at pages 6-7 and figure 2.

      The metric being reported is accuracy for the independent replicate if the other two are used to train?

      Yes. 

      Panel C is a very cool analysis. Panel F needs a description of how those images were selected, randomly?

      Added in the methods section (page 29). GradCAM analysis was used to visualize the regions used by the CNN for classification. This map is specific to each cell. Images are selected randomly out the full dataset for visualization.  

      They also need scale bars.

      Added to the figures. 

      Panel G would benefit from explicit channel labels (at least a legend would be good!).

      Explanation has been added to the legend. All color code and channel numbering are consistent with fig. 1A. 

      What do the dots and boxplots represent? The legend says, "independent replicates", but independent replicates of, I assume, different model initializations?

      Clarification has been added to the figure legends. For plots showing the performance of a CNN or RF classifier, each dot represents a different model initialization. Each classifier has been initialized at least 3 times. When indicated, the model training was performed with different random seeds for data splitting.

      • Comments on Figure 3. Panel A needs scale bar. See comment on Panel D in concern #1 described above. 

      This has been added.

      • Comments on Supplementary Figure 1. A reader will need a more detailed description in panel C. I assume that the grey bar is the average of the points, and the points represent different single cells?

      How many cells? How were these cells selected? 

      This information on the figure (now Suppl. Fig. 1D), has been added to the legend.

      “Left: Representative images of 1321N1 cells with increasing density alongside their cell and nuclear mask produced using resp. Cellpose and Stardist. Images are numbered from 1-5 with increasing density. Upper right: The number of ROIs detected in comparison to the ground truth (manual segmentation). A ROI was considered undetected when the intersection over union (IoU) was below 0,15. Each bar refers to the image number on the left. The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. IoU for increasing cell density for cell and nuclear masks is given in the bottom right. Each point represents an individual ROI. Each bar refers to the image number on the left.”

      • Comments on Figure 4. More details on quenching are needed for a general audience. The markers chosen (EdU and BrdU) are generally not specific to cell type but to biological processes (proliferation), so it is confusing how they are being used as cell-type markers. 

      The base analogues were incorporated into each cell line prior to mixing them, i.e.  when they were still growing in monoculture so they could be labelled and identified after co-seeding and morphological profiling. Additional clarification has been added to the manuscript (page 26) 

      It is also unclear why reducing CV is an important side-effect of finetuning. CV of what? The legend says, "model iterations", but what does this mean? 

      The dots in the violinplot are different CNN initializations. A lower variability between model initializations is an indicator of certainty of the results. Prior to finetuning, the results of the CNN were highly variable leading to a high CoV between the different CNNs. This means the outcome after finetuning is more robust.

      • Comments on Figure 5. This is a very convincing and well-described result, kudos! This provides another opportunity to again compare other approaches (not just nucleocentric). Additionally, since the UMAP space uses hand-crafted features. The authors could consider interpreting the specific morphology features impacted by the striking gradual shift to neuron population by fitting a series of linear models per individual feature. This might confirm (or discover) how exactly the cells are shifting morphology.

      The supervised UMAP on the handcrafted features did not highlight any features contributing to the separation. Using the supervised UMAP, the clustering is dominated by the known cell type. Unsupervised UMAP on the handcrafted features does not show any clustering. In response to a previous comment, we adapted the figure to show UMAP dimensionality reduction using the feature embeddings from the cell-based CNN. This unsupervised UMAP does show good cell type separation, but it does not use any directly interpretable shape descriptors.

      • General comments on Methods. The section on "ground truth alignment" needs more details. Why was this performed? 

      Following sequential staining and imaging rounds, multiple images were captured representing the same cell with different markers. Lifting the plate of the microscope stage and imaging in sequential rounds after several days results in small linear translations in the exact location of each image. These linear translations need to be corrected to align (or register) morphological with ground truth image data within the same ROI. This notion has been added to the manuscript at page 26. 

      Handcrafted features extracted using what software? 

      The complete analysis was performed in python. All packages used are listed in table 4. Handcrafted features were extracted using the scikit-image package (regionprops and GLCM functions). This has been added to the manuscript at page 27.

      Software should be cited more often throughout the manuscript. 

      Lastly, the GitHub URL points to the DeVosLab organization, but should point to a specific repository. Therefore, I was unable to review the provided code. A well-documented and reproducible analysis pipeline should be included.

      A test dataset and source code are available on GitHub:  https://github.com/DeVosLab/Nucleocentric-Profiling

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1. In Figure 1, the MafB antibody (Sigma) was used to identify Renshaw cells at P5. However, according to the supplementary Figure 3D, the specificity of the MafB antibody (Sigma) is relatively low. The image of MafB-GFP, V1-INs, and MafB-IR at P5 should be added to the supplementary figure. The specificity of MaFB-IR-Sigma in V1 neurons at P5 should be shown. This image also might support the description of the genetically labeled MafB-V1 distribution at P5 (page 8, lines 28-32). 

      We followed the reviewer’s suggestion and moved analyses of the MafB-GFP mouse to a supplemental figure (Fig S3). The characterization of MafB immunoreactivities is now in supplemental Figure S2 and the related text in results was also moved to supplemental to reduce technicalities in the main text. We added confocal images of MafB-GFP V1 interneurons at P5 showing immunoreactivities for both MafB antibodies, as suggested by the reviewer (Fig S2A,B). We agree with the reviewer that this strengthens our comparisons on the sensitivity and specificity of the two MafB antibodies used in this study. 

      As explained in the preliminary response we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth. This is why we used tissues from late embryos to check MafB immunoreactivities (Figure S2C and S2D). We made this point clearer in the text and supplemental figure legends.

      Comment 2. The proportion of genetically labeled FoxP2-V1 in all V1 is more than 60%, although immunolabeled FoxP2-V1 is approximately 30% at P5. Genetically labeled Otp-V1 included other nonFoxP2 V1 clades (Fig. 8L-M). I wonder whether genetically labeled FoxP2-V1 might include the other three clades. The authors should show whether genetically labeled FoxP2-V1 expresses other clade markers, such as pou6f2, sp8, and calbindin, at P5. 

      We included the requested data in Figure 3E-G. Lineage-labeled Foxp2-V1 neurons in our genetic intersection do not include cells from other V1-clades.

      Reviewer 2:

      Comment 1. The current version of the paper is VERY hard to read. It is often extremely difficult to "see the forest for the trees" and the reader is often drowned in methodological details that provide only minor additions to the scientific message. Non-specialists in developmental biology, but still interested in the spinal cord organization, especially students, might find this article challenging to digest and there is a high risk that they will be inclined to abandon reading it. The diversity of developmental stages studied (with possible mistakes between text and figures) adds a substantial complexity in the reading. It is also not clear at all why authors choose to focus on the Foxp2 V1 from page 9. Naively, the Pou6f2 might have been equally interesting. Finally, numerous discrepancies in the referencing of figures must also be fixed. I strongly recommend an in-depth streamlining and proofreading, and possibly moving some material to supplement (e.g. page 8, and elsewhere).

      The whole text was re-written and streamlined with most methodological discussion (including the section referred to by the reviewer) transferred to supplemental data. Nevertheless, enough details on samples, stats and methods were retained to maintain the rigor of the manuscript. 

      The reasons justifying a focus on Foxp2-V1 interneurons were fully explained in our preliminary response. Briefly, we are trying to elucidate V1 heterogeneity, and prior data showed that this is the most heterogeneous V1 clade (Bikoff et al., 2016), so it makes sense it was studied further. We agree that the Pou6f2 clade is equally interesting and is in fact the subject of several ongoing studies.

      Comment 2. … although the different V1 populations have been investigated in detail regarding their development and positioning, their functional ambition is not directly investigated through gain or loss of function experiments. For the Foxp2-V1, the developmental and anatomical mapping is complemented by a connectivity mapping (Fig 6s, 8), but the latter is fairly superficial compared to the former. Synapses (Fig 6) are counted on a relatively small number of motoneurons per animal, that may, or may not, be representative of the population. Likewise, putative synaptic inputs are only counted on neuronal somata. Motoneurons that lack of axo-somatic contacts may still be contacted distally. Hence, while this data is still suggestive of differences between V1 pools, it is only little predictive of function.

      We fully answered the question on functional studies in the preliminary response. Briefly, we are currently conducting these studies using various mouse models that include chronic synaptic silencing using tetanus toxin, acute partial silencing using DREADDs, and acute cell deletion using diphtheria toxin. Each intervention reveals different features of Foxp2-V1 interneuron functions, and each model requires independent validation. Moreover, these studies are being carried out at three developmental stages: embryos, early postnatal period of locomotor maturation and mature animals. Obviously, this is all beyond the goals and scope of the present study. The present study is however the basis for better informed interpretations of results obtained in functional studies.

      Regarding the question on synapse counts, we explained in the preliminary results fully why we believe our experimental designs for synapse counting at the confocal level are among the most thorough that can be found in the literature. We counted a very large number of motoneurons per animal when adding all motor column and segments analyzed in each animal. Statistical power was also enough to detect fundamental variation in synaptic density among motor columns.

      We focus our analyses on motoneuron cells bodies because analysis of full dendritic arbors on all motor columns present throughout all lumbosacral segments is not feasible. Please see Rotterman et al., 2014 (J. of Neuroscience; doi: 10.1523/JNEUROSCI.4768-13.2014) for evaluation of what this entails for a single motoneuron. We agree with the reviewer that analyses of V1 synapses over full dendrite arbors in specific motoneurons will be very relevant in further studies. These should be carried out now that we know which motor columns are of high interest. Nevertheless, inhibitory synapses exert the most efficient modulation of neuronal firing when they are on cell bodies, and our analyses clearly suggest a difference in in cell body inhibitory synapses targeting between different V1 interneuron types that we find very relevant.

      Comment 3. I suggest taking with caution the rabies labelling (Figure 8). It is known that this type of Rabies vectors, when delivered from the periphery, might also label sensory afferents and their postsynaptic targets in the cord through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). Yet I am not sure authors have made all controls to exclude that labelled neurons, presumed here to be premotoneurons, could rather be anterogradely labelled from sensory afferents. 

      Over the years, we performed many extensive controls and validation of rabies virus transsynaptic tracing methods. These were presented at two SfN meetings (Gomez-Perez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Our validation of this technique was fully explained in our preliminary response. We also pointed out that the methods used by Pimpinella et al. have a very different design and therefore their results are not comparable to ours. In this study we injected the virus at P15 into leg muscles, and not directly into the spinal cord. In our hands, and as cited in Pimpinella et al., the rabies virus loses tropism for primary afferents with age when injected in muscle. The lack of primary afferent labeling in key lumbosacral segments (L4 and L5) is now illustrated in a new supplemental figure (Figure S6). This figure also shows some starter motoneurons. As explained in the text and in our previous response, these are few in number because of the reduced infection rate when using this method in mature animals (after P10).  

      Comment 4. The ambition to differentiate neuronal birthdate at a half-day resolution (e.g., E10 vs E10.5) is interesting but must be considered with caution. As the author explains in their methods, animals are caged at 7pm, and the plug is checked the next morning at 7 am. There is hence a potential error of 12h. 

      We agree with the reviewer, and we previously explicitly discussed these temporal resolution caveats. We have now further expanded on this in new text (see middle paragraph in page 5). Nevertheless, the method did reveal the temporal sequence of neurogenesis of V1 clades with close to 12-hour resolution.

      As explained in text and preliminary response this is because we analyzed a sufficient number of animals from enough litters and utilized very stringent criteria to count EdU positives. 

      Moreover, our results fit very well with current literature. The data agree with previous conclusions from Andreas Sagner group (Institut für Biochemie, Friedrich-Alexander-Universität Erlangen-Nürnberg), on spinal interneurons (including V1s) birthdates based on a different methodology (Delile J et al.

      Development. 2019 146(12):dev173807. doi: 10.1242/dev.173807. PMID: 30846445; PMCID: PMC6602353). In the discussion we compared in detail both the data and methods between Delile article and our results. We also cite Sagner 2024 review as requested later in the reviewer’s detailed comments. Our results also confirmed our previous report on the birthdates of V1-derived Renshaw cells and Ia inhibitory interneurons (Benito-Gonzalez A, Alvarez FJ J Neurosci. 2012 32(4):1156-70. doi: 10.1523/JNEUROSCI.3630-12.2012. PMID: 22279202; PMCID: PMC3276112). Finally, we recently received a communication notifying us that our neurogenesis sequence of V1s has been replicated in a different vertebrate species by Lora Sweeney’s group (Institute of Science and Technology Austria; direct email from this lab) and we shared our data with them for comparison. This manuscript is currently close to submission. Therefore, we are confident that despite the limitations of EdU birthdating we discussed, the conclusions we offered are strong and are being validated by other groups using different methods and species. We also want to acknowledge the positive comments of reviewer 3 regarding our birthdating study, indicating it is one the most rigorous he or she has ever seen.

      Reviewer 3:

      Comment 1. My only criticism is that some of the main messages of the paper are buried in technical details. Better separation of the main conclusions of the paper, which should be kept in the main figures and text, and technical details/experimental nuances, which are essential but should be moved to the supplement, is critical. This will also correct the other issue with the text at present, which is that it is too long.

      Similar to our response to comment 1 from Reviewer 2 we followed the reviewers’ recommendations and greatly summarized, simplified and removed technical details from the main text, trying not to decrease rigor.  

      Reviewer #1 (Recommendations For The Authors):

      In Figure 1, the definition of the area to analyze MafB ventral and MafB dorsal is unclear. It should be described.

      This has been clarified in both text and supplemental figure S3.

      “We focused the analyses on the brighter dorsal and ventral MafB-V1 populations defined by boxes of 100 µm dorsoventral width at the level of the central canal (dorsal) or the ventral edge of the gray matter (ventral) (Supplemental Figure S3B).”

      Problems with figure citation.

      We apologize for the mistakes. All have been corrected. 

      Reviewer #2 (Recommendations For The Authors):

      As indicated in the public review, I'd recommend to substantially revise the writing, for clarity. As such, the paper is extremely hard to read. I would also recommend justifying the focus on Foxp2 neurons.

      Also, the scope of the present paper is not clearly stated in the introduction (page 4).

      Done. We also modified the introduction such that the exact goals are more clearly stated.

      I would also recommend toning down the interpretation that V1 clades constitute "unique functional subsets" (discussion and elsewhere). Functional investigation is not performed, and connectomic data is partial and only very suggestive.

      We include the following sentence at the end of the 1st paragraph in the discussion:

      “This result strengthens the conclusion that these V1 clades defined by their genetic make-up might represent distinct functional subtypes, although further validation is necessary in more functionally focused studies.”

      Different post-natal stages are used for different sections of the manuscript. This is often confusing, please justify each stage. From the beginning even, why is the initial birthdating (Figure 1) done here at p5, while the previous characterization of clades was done at p0? I am not sure to understand the justification that this was chosen "to preserve expression of V1 defining TFs". Isn't the sooner the better?

      The birthdating study was carried out at P5. P5 is a good time point because there is little variation in TF expression compared to P0, as demonstrated in the results. Furthermore, later tissue harvesting allows higher replicability since it is difficult to consistently harvest tissue the day a litter is born (P0). Also technically, it is easier to handle P5 tissue compared to P0. The analysis of VGUT1 synapses was also done at P5 rather than later ages. This has two advantages: TFs immunoreactivities are preserved at this age, and also corticospinal projections have not yet reached the lumbar cord reducing interpretation caveats on the origins of VGUT1 synapses in the ventral horn (although VGLUT1 synapses are still maturing at this age, see below).

      Other parts of the study focus on different ages selected to be most adequate for each purpose. To best study synaptic connectivity, it is best to study mature spinal cords after synaptic plasticity of the first week. For the tracing study we thoroughly explain in the text the reasons for the experimental design (see also below in detailed comments). For counting Foxp2-V1 interneurons and comparing them to motor columns we analyze mature animals. For testing our lineage labeling we use animals of all ages to confirm the consistency of the genetic targeting strategy throughout postnatal development and into adulthood.

      Figure 5: wouldn't it be worth quantifying and illustrating cellular densities, in addition to the average number of Foxp2 neurons, across lumbar segments (panel D & E)? Indeed, the size of - and hence total number of cells within - each lumbar segment might not be the same, with a significant "enlargement" from L2 to L4 (this is actually visible on the transverse sections). Hence, if the total number of cells is in the higher in these enlarged segments, but the total number of Foxp2-V1 is not, it may mean that this class is proportionally less abundant.

      We believe the critical parameter is the ratio of Foxp2-V1s to motoneurons. This informs how Foxp2-V1 interneurons vary according to the size of the motor columns and the number of motoneurons overall.

      The question asked by the reviewer would best be answered by estimating the proportion of Foxp2-V1 neurons to all NeuN labeled interneurons. This is because interneuron density in the spinal cord varies in different segments. We are not sure what this additional analysis will contribute to the paper.

      Why, in the Rabies tracing scheme (Fig 8), the Rabies injection is performed at p15? As the authors explain in the text, rabies uptake at the neuromuscular junction is weak after p10. It is not clear to me why such experiments weren't done all at early postnatal stages, with a "classical" co-injection of TVA and Rabies.

      First, we do not need TVA in this experiment because we are using B19-G coated virus and injecting it into muscles, not into the spinal cord directly.

      Second, enhanced tracing occurs when the AAV is injected a few days before rabies virus. This is because AAV transgene expression is delayed with respect to rabies virus infection and replication. We have performed full time courses and presented these data in one abstract to SfN: Gomez-Perez et al., 2015 Program Nos. 242. We believe full description of these technical details is beyond the scope of this manuscript that has already been considered too technical.

      Third, the justification of P15 timing of injections for anterograde primary afferent labeling and retrograde monosynaptic labeling of interneurons is fully explained in the text. 

      “To obtain transcomplementation of RVDG-mCherry with glycoprotein in LG motoneurons, we first injected the LG muscle with an AAV1 expressing B19-G at P4. We then performed RVDG and CTB injections at P15 to optimize muscle targeting and avoid cross-contamination of nearby muscles. Muscle specificity was confirmed post-hoc by dissection of all muscles below the knee. Analyses were done at P22, a timepoint after developmental critical windows through which Ia (VGLUT1+) synaptic numbers increase and mature on V1-IaINs (Siembab et al., 2010)” 

      Furthermore, CTB starts to decrease in intensity 7 days after injection because intracellular degradation and rabies virus labeling disappears because cell death. Both limit the time of postinjection for analyses.

      Likewise, I am surprised not to see a single motoneuron in the rabies tracing (Fig 8, neither on histology nor on graphs (Fig 8). How can authors be certain that there was indeed rabies uptake from the muscle at this age, and that all labelled cells, presumed to be preMN, are not actually sensory neurons? It is known that Rabies vectors, when delivered from the periphery, might also label sensory afferents and their post-synaptic targets through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). This potential bias must be considered.

      This is fully explained in our previous response to the second reviewer’s general comments. We have also added a confocal image showing starter motoneurons as requested (Figure S6A).

      Please carefully inspect the references to figures and figure panels, which I suspect are not always correct.

      Thank you. We carefully revised the manuscript to correct these deficiencies and we apologize for them.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1: Data here is absolutely beautiful and provides one of the most thorough studies, in terms of timepoints, number of animals analyzed, and precision of analysis, of edU-based birth timing that has been published for neuron subtypes in the spinal cord so far. My only suggestion is to color code the early and late born populations (in for example, different shades of green for early; and blue for late, to better emphasize the differences between them). It is very difficult to differentiate between the purple, red and black colors in G-I, which this would also fix. The antibody staining for Pou6f2 (F) is also difficult to see; gain could be increased on these images or insets added for clarity.

      The choice of colors is adapted for optimal visualization by people with different degrees of color blindness. Shades of individual colors are always more difficult to discriminate. This is personally verified by the senior corresponding author of this paper who has some color discrimination deficits. Moreover, each line has a different symbol for the same purpose of easing differentiation.

      Figure 2: This is also a picture-perfect figure showing further diversity by birth time even within a clade. One small aesthetic comment is that the arrows are quite unclear and block the data. Perhaps the contours themselves could be subdivided by region and color coded by birth time-such that for example the dorsal contours that emerge in the MafB clade at E11 are highlighted in their own color. Some quantification of the shift in distribution as well as the relative number of neurons within each spatially localized group would also be useful. For MafB, for example, it looks as though the ventral cells (likely Renshaw) are generated at all times in the contour plots; in the dot plots however, it looks like the most ventral cells are present at e10.5. This is likely because the contours are measuring fractional representations, not absolute number. An independent measure of absolute number of ventral and dorsal, by for example, subdividing the spinal cord into dorsoventral bins, would be very useful to address this ambiguity.

      We believe density plots already convey the message of the shift in positions with birthdate. We are not sure how we can quantify this more accurately than showing the differences in cellular density plots. We used dorsoventral and mediolateral binning in our first paper decades ago (Avarez et al., 2005). This has now been replaced by more rigorous density profiles that describe better cell distributions. Unfortunately, to obtain the most accurate density profiles we need to pool all cells from all animals precluding statistical comparisons. This is because for some groups there have very few cells per animal (for example early born Sp8 or Foxp2 cells).

      Figure 3 and Figure 4: These, and all figures that compare the lineage trace and antibody staining, should be moved to the supplement in my opinion-as they are not for generalist readers but rather specialists that are interested in these exact tools. In addition, the majority of the text that relates to these figures should be transferred to the supplement as well. Figure 5: Another great figure that sets the stage for the analysis of FoxP2V1-to-MN synaptic connectivity, and provides basic information about the rostrocaudal distribution of this clade, by analyzing settling position by level. I have only minor comments. The grid in B obscures the view of the cells and should be removed. The motor neuron cell bodies in C would be better visible if they were red.

      We moved some of the images to supplemental (see new supplemental Fig S4). However, we also added new data to the figure as requested by reviewers (Fig 3E-G). We preserved our analyses of Foxp2 and non-Foxp2 V1s across ages and spinal segments because we think this information is critical to the paper. Finally, we want to prevent misleading readers into believing that Foxp2 is a marker that is unique to V1s. Therefore, we also preserved Figures 3H to 3J showing the non-V1 Foxp2 population in the ventral horn. 

      Figure 6: Very careful and quantitative analysis of V1 synaptic input to motor neurons is presented here.  For the reader, a summary figure (similar to B but with V1s too) that schematizes V1 FoxP2 versus Renshaw cell connectivity with LMC, MMC, and PGC motor neurons are one level would be useful.

      Thanks for the suggestion. A summary figure has now been included (Figure 5G). 

      Figure 7: The goal of this figure is to highlight intra-clade diversity at the level of transcription factor expression (or maintenance of expression), birth timing and cell body position culminating in the clear and concise diagram presented in G. In panels A-F however, it takes extra effort to link the data shown to these I-IV subtypes. The figure should be restructured to better highlight these links. One option might be to separate the figure into four parts (one for each type): with the individual spatial, birth timing and TF data for each population extracted and presented in each individual part.

      We agree with the reviewer that this is a very busy figure. We tried to re-structure the figure following the suggestions of the reviewer and also several alternative options. All resulted in designs that were more difficult to follow than the original figure. We apologize for its complexity, but we believe this is the best organization to describe all the data in the simplest form.

      Figure 8: in A-D, the main point of the figure - that V1FoxP2Otp preferentially receive proprioceptive synapses is buried in a bunch of technical details. To make it easier for the reader, please:

      (1) add a summary as in B of the %FoxP2-V1 Otp+ cells (82%) with Vglut1 synapses to make the point stronger that the majority of these cells have synapses.

      We added this graph by extending the previous graph to include lineage labeled Foxp2-V1s with OTP or Foxp2 immunoreactivity. It is now Figure 7B.

      (2) Additionally, add a representative example that shows large numbers of proximal synapses on an FoxP2-V1 Otp+.

      The image we presented before as Figure 8A was already immunostained for OTP, so we just added the OTP channel to the images. Now all this information is in panels that are subparts of Figure 7A.

      (3) Move the comparison between FoxP2-V1 and FoxP2AB+V1s to the supplement.

      We preserved the quantitative data on Foxp2-V1 lineage cells with Foxp2-immunoreactivity but made this a standalone figure, so it is not as busy.

      (4) Move J-M description of antibody versus lineage trace of Otp to supplement as ending with this confuses the main message of the paper (see comment above).

      All results for the Otp-V1 mouse model have now been placed in a supplemental figure (Figure 5S).

      Discussion: A more nuanced and detailed discussion of how the temporal pattern of subtype generation presented here aligns with the established temporal transcription factor code (nicely summarized in Sagner 2024) would be helpful to place their work in the broader context of the field.

      This aspect of the discussion was expanded on pages 20 and 21. We replaced the earlier cited review (Sagner and Briscoe, 2019, Development) with the updated Sagner 2024 review and further discussed the data in the context of the field and neurogenesis waves throughout the neural tube, not only the spinal cord. We previously carefully compared our data with the spinal cord data from Sagner’s group (Delile et, 2019, Development). We have now further expanded this comparison in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript by Napoli et al, the authors study the intracellular function of Cytosolic S100A8/A9 a myeloid cell soluble protein that operates extracellularly as an alarmin, whose intracellular function is not well characterized. Here, the authors utilize state-of-the-art intravital microscopy to demonstrate that adhesion defects observed in cells lacking S100A8/A9 (Mrp14-/-) are not rescued by exogenous S100A8/A9, thus highlighting an intrinsic defect. Based on this result subsequent efforts were employed to characterize the nature of those adhesion defects.

      The authors thank reviewer #1 for his/her insightful comments and suggestions. Please find our point to point responses below.

      (1) Ex vivo characterization of the function of S100A8/A9 in adhesion, spreading, and calcium signaling requires at least one rescue experiment to support the direct role of these proteins in the biological processes under study.

      We thank the reviewer for this comment. We agree that rescue experiments would be helpful to confirm the direct role of intracellular S100A8/A9 in adhesion, spreading, and Ca2+ signaling. Although transfection of primary cells, especially neutrophils, poses challenges due to their short half-life, we now have undertaken additional in vitro rescue experiments. Specifically, we used extracellular S100A8/A9 and coated Ibidi flow chambers with E-selectin, ICAM-1 and CXCL1 alone or alongside S100A8/A9, and measured rolling and adhesion of blood neutrophils. Our data reveal that extracellular S100A8/A9 can induce increased adhesion in WT neutrophils but fails to rescue the adhesion defect in Mrp14-/- neutrophils (Author response image 1). This result corroborates our in vivo findings, emphasizing that the observed adhesion defect is due to the lack of intracellular S100A8/A9.

      Author response image 1.

      Extracellular S100A8/A9 does not rescue the adhesion defect in Mrp14/- neutrophils. Analysis of number of adherent leukocytes FOV-1 normalized to the WBC of WT and Mrp14-/- mice. Whole blood was harvested through a carotid artery catheter and perfused with a high precision pump at constant shear rate using flow cambers coated with either E-selectin, ICAM-1 and CXCL1 or E-selectin, ICMA-1, CXCL1 and S100A8/A9. [mean+SEM, n=5 mice per group, 12 (WT) and 14 (Mrp14-/-) flow chambers, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (2) There is room for improvement in the analysis of signaling pathways presented in Figures 3 H and I. Western blots and analyses are not convincing, in particular for p-Pax.

      We acknowledge the reviewer's concern regarding the clarity of the signaling pathway analysis, particularly the western blots for p-Paxillin. To address this, we have repeated the western blot experiments using murine neutrophils. Our new data confirm the defective paxillin phosphorylation upon CXCL1 stimulation and ICAM-1 binding in the absence of cytosolic S100A8/A9. We have now integrated these new findings with the original data and included the updated results in the manuscript (Figure 3I revised). These enhanced analyses provide a more robust and convincing demonstration of the signaling defects in Mrp14-/- neutrophils.

      (3) At least one western blot showing a knockdown of S100A8/A9 should be included towards the beginning of the result section.

      We appreciate the reviewer's suggestion to include a western blot demonstrating the knockout of S100A8/A9 early in the results section. In a recent publication by our group, we have already demonstrated the absence of S100A8/A9 at the protein level in Mrp14-/- neutrophils via western blotting ([1], please refer to Extended Data Fig. 1h). We agree that visual confirmation of the absence of S100A8/A9 protein is crucial for establishing the validity of our study.

      (4) The Ca2+ measurements at LFA-1 nanoclusters using the Mrp14-/- Lyz2xGCamP5 are interesting; It is understood that the authors are correcting calcium levels by normalizing by LFA-1 cluster areas and that seems fine to me. The issue is that the total calcium signal seems decreased in Mrp14-/- cells compared to WT cells (Fig. 4E)...why is totalCa2+ low? Please discuss.

      We thank the reviewer for this insightful comment. Indeed, our observations reveal reduced overall Ca2+ levels in Mrp14-/- neutrophils compared to WT neutrophils. Initially, we noticed a general decrease in Ca2+ intensity (Author response image 2A-B) and lifetime in Mrp14-/- neutrophils (Author response image 2C-D). Further analysis indicated that these differences in Ca2+ levels are localized specifically to the LFA-1 nanocluster sites. In contrast, the cytosolic Ca2+ levels outside of the LFA-1 nanocluster areas were comparable between Mrp14-/- and WT neutrophils (Figure 4H-J). This suggests that the reduced total Ca2+ levels observed in Mrp14-/- neutrophils are primarily due to the impaired Ca2+ supply at the LFA-1 nanocluster areas. Our data support the notion that cytosolic S100A8/A9 plays a crucial role in actively supplying Ca2+ to LFA-1 nanoclusters during neutrophil crawling. In the absence of S100A8/A9, the increase in overall Ca2+ levels (summing both inside and outside LFA-1 nanocluster areas) is minimal, further highlighting the specific role of S100A8/A9 in maintaining localized Ca2+ concentrations at these crucial sites.

      Author response image 2.

      Overall Ca2+ levels in WT and Mrp14-/- neutrophils (A) Representative confocal images of neutrophils from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 mice, labeled with Lyz2 td Tomato marker. The images illustrate overall cytosolic Ca2+ levels during neutrophil crawling flow chambers coated with E-selectin, ICAM-1, and CXCL1 (scale bar=10μm). (B) Quantitative analysis of total cytosolic Ca2+ intensity in single cells from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils measured over three time intervals: min 0-1, 5-6 and 9-10 [mean+SEM, n=5 mice per group, 56 (WT) and 54 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. (C) Representative traces and (D) single cell analysis of total Ca2+ lifetime over the first 5 minutes in WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils crawling on Eselectin, ICAM-1, and CXCL1 coated flow chambers recorded with FLIM microscopy [mean+SEM, n=3 mice per group, 111 (WT) and 95 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (5) Even if the calcium level outside LFA-1 nanoclusters is not significant (Figure 4J), the data at min 9-10 in Figure 4J seems to be affected by a single event that may be an outlier. Additional data may be needed here.

      We appreciate the reviewer’s attention to this detail. To address the concern regarding a potential outlier in the Ca2+ level measurements at 9-10 minutes in Figure 4J, we rigorously tested the dataset using the GraphPad outlier calculator. The analysis revealed that no data point was statistically identified as an outlier. Given that the current dataset is robust and the statistical analysis confirms the integrity of the data, we believe that the results accurately reflect the biological variability observed in our experiments. Therefore, we have not added additional data points at this stage but remain open to discussing this further.

      (6) Finally, even though there is less calcium at LFA-1 clusters, that does not necessarily mean that "cytosolic S100A8/A9 plays an important role in Ca2+ "supply" at LFA-1 adhesion spots" as proposed. S100A8/A9 may play an indirect role in calcium availability. The analysis of the subcellular localization of S100A8/A9 at LFA-1 clusters together with calcium dynamics in stimulated WT cells would help support the authors' interpretation, which although possibly correct, seems speculative at this point.

      We thank the reviewer for this insightful comment and fully agree that additional evidence regarding the subcellular localization of S100A8/A9 would strengthen our conclusions. Although live cell imaging of intracellular S100A8/A9 was initially challenging due to technical limitations, we have now performed additional experiments to address this issue. We conducted end-point measurements where we allowed WT neutrophils to crawl on E-selectin, ICAM-1, and CXCL1 coated flow chambers for 10 minutes. Following this, we fixed and permeabilized the cells to stain intracellular S100A9, along with LFA-1 and a cell tracker for segmentation. Confocal microscopy and subsequent single-cell analysis revealed a significant enrichment of S100A8/A9 at LFA-1 positive nanocluster areas compared to the surrounding cytosol (Figure 4K and 4L, new). This finding supports our hypothesis that S100A8/A9 plays a direct role in the localized supply of Ca2+ at LFA-1 adhesion spots, thus facilitating efficient neutrophil crawling under shear stress. These new data have been included in the revised manuscript, providing stronger evidence for our proposed mechanism.

      Reviewer #2:

      Napoli et al. provide a compelling study showing the importance of cytosolic S100A8/9 in maintaining calcium levels at LFA-1 nanoclusters at the cell membrane, thus allowing the successful crawling and adherence of neutrophils under shear stress. The authors show that cytosolic S100A8/9 is responsible for retaining stable and high concentrations of calcium specifically at LFA-1 nanoclusters upon binding to ICAM-1, and imply that this process aids in facilitating actin polymerisation involved in cell shape and adherence. The authors show early on that S100A8/9 deficient neutrophils fail to extravasate successfully into the tissue, thus suggesting that targeting cytosolic S100A8/9 could be useful in settings of autoimmunity/acute inflammation where neutrophil-induced collateral damage is unwanted.

      The authors appreciate reviewer #2's insightful comments and suggestions. Below are our detailed responses:

      (1) Extravasation is shown to be a major defect of Mrp14-/- neutrophils, but the Giemsa staining in Figure 1H seems to be quite unspecific to me, as neutrophils were determined by nuclear shape and granularity. It would have perhaps been more clear to use immunofluorescence staining for neutrophils instead as seen in Supplementary Figure 1A (staining for Ly6G or other markers instead of S100A9).

      We acknowledge the reviewer's concern. However, Giemsa staining is a well-established method in hematology, histology, cytology, and bacteriology, widely recognized for its ability to distinguish leukocyte subsets based on nuclear shape and cytoplasmic characteristics. This method is extensively documented in the literature [2-5]. Its advantages are the easy morphological discrimination of leukocytes based on nuclear and cytoplasmic shape and conformation (Author response image 3).

      Author response image 3.

      Giemsa staining of extravasated leukocyte subsets. (A) Representative image of Giemsa-stained cremaster muscle tissue post-TNF stimulation. The image clearly differentiates leukocyte subsets (white arrow = neutrophils, yellow arrow = eosinophils, red arrow = monocytes). Scale bar = 50µm.

      (2) The representative image for Mrp14-/- neutrophils used in Figure 4K to demonstrate Ripley's K function seems to be very different from that shown above in Figures 4C and 4F.

      The reviewer correctly observed that the cell in Figure 4K is different from those in Figures 4C and 4F. This is intentional, as Figure 4K is meant to show a representative image that accurately reflects the overall results of the experiments. We assure the reviewer that all cells analyzed in Figures 4C and 4F were also included in the analysis for Figure 4K.

      (3) Although the authors have done well to draw a path linking cytosolic S100A8/9 to actin polymerisation and subsequently the arrest and adherence of neutrophils in vitro, the authors can be more explicit with the analysis - for example, is the F-actin co-localized with the LFA-1 nanoclusters? Does S100A8/9 localise to the membrane with LFA-1 upon stimulation? Lastly, I think it would have been very useful to close the loop on the extravasation observation with some in vitro evidence to show that neutrophils fail to extravasate under shear stress.

      We thank the reviewer for this comment and questions. 

      Concerning the co-localization of F-actin with LFA-1 nanoclusters and S100A8/9 localization: We appreciate the reviewer's interest in the co-localization between F-actin and LFA-1. Unfortunately, due to the limitations of our GCaMP5 mouse model (with neutrophils labeled with td-Tomato and eGFP for LyzM and Ca2+), we could only stain for either LFA-1 or F-actin at a time. However, in our F-actin movies, we observed that F-actin predominantly localizes at the rear of the cell, while LFA-1 is more uniformly distributed at the plasma membrane.

      Regarding S100A8/A9 localization, as mentioned in response to Reviewer 1's sixth point, we now conducted endpoint measurements. We stained neutrophils with cell tracker green CMFDA and LFA-1, allowed them to crawl on E-selectin, ICAM-1, and CXCL1-coated flow chambers, and then performed intracellular S100A9 staining after fixation and permeabilization. Our analysis shows higher S100A9 intensity at LFA-1 positive areas compared to LFA-1 negative areas (Figure 4K and 4L, new). This indicates that S100A8/A9 indeed concentrates Ca2+ at LFA-1 nanoclusters, supporting adhesion and post-arrest modification events under flow.

      Regarding the extravasation defect under shear stress: To address the reviewer's suggestion, we performed transwell migration assays under static conditions. Our results show no significant difference in transmigration between WT and Mrp14-/- neutrophils without flow, indicating that the extravasation defect in Mrp14-/- neutrophils is shear-dependent. This supports our hypothesis that S100A8/A9-mediated Ca2+ supply at LFA-1 nanoclusters is critical under flow conditions (Author response image 4).

      Author response image 4.

      Static Transmigration assay. (a) Transmigration of WT and Mrp14-/- neutrophils in static transwell assays (3um pore size, 45min migration time) showing spontaneously migration (PBS) or migration towards CXCL1. [mean+SEM, n=3 mice per group, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      Additional References

      (1) Pruenster, M., et al., E-selectin-mediated rapid NLRP3 inflammasome activation regulates S100A8/S100A9 release from neutrophils via transient gasdermin D pore formation. Nature Immunology, 2023. 24(12): p. 2021-2031.

      (2) Kuwano, Y., et al., Rolling on E- or P-selectin induces the extended but not high-affinity conformation of LFA-1 in neutrophils. Blood, 2010. 116(4): p. 617-24.

      (3) Porse, B., Mouse Hematology – A Laboratory Manual. European Journal of Haematology, 2010. 84(6): p. 554-554.

      (4) Frommhold, D., et al., Protein C concentrate controls leukocyte recruitment during inflammation and improves survival during endotoxemia after efficient in vivo activation. Am J Pathol, 2011. 179(5): p. 2637-50.

      (5) Braach, N., et al., RAGE Controls Activation and Anti-Inflammatory Signalling of Protein C. PLOS ONE, 2014. 9(2): p. e89422.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study characterized the cellular and molecular mechanisms of spike timing-dependent long-term depression (t-LTD) at the synapses between excitatory afferents from lateral (LPP) and medial (MPP) perforant pathways to granule cells (GC) of the dentate gyrus (DG) in mice.

      Strengths:

      The electrophysiological experiments are thorough. The experiments are systematically reported and support the conclusions drawn.

      This study extends current knowledge by elucidating additional plasticity mechanisms at PP-GC synapses, complementing existing literature.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      To more conclusively define the pivotal role of astrocytes in modulating t-LTD at MPP and LPP GC synapses through SNARE protein-dependent glutamate release, as posited in this study, the authors could adopt additional methods, such as alternative mouse models designed to regulate SNARE-dependent exocytosis, as well as optogenetic or chemogenetic strategies for precise astrocyte manipulation during t-LTD induction. This would provide more direct evidence of the influence of astrocytic activity on synaptic plasticity.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE mice, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocyte participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+dependent exocytosis of glutamate from astrocytes.

      Reviewer #2 (Public Review):

      Summary:

      This work reports the existence of spike timing-dependent long-term depression (t-LTD) of excitatory synaptic strength at two synapses of the dentate gyrus granule cell, which are differently connected to the entorhinal cortex via either the lateral or medial perforant pathways (LPP or MPP, respectively). Using patch-clamp electrophysiological recording of tLTD in combination with either pharmacology or a genetically modified mouse model, they provide information on the differences in the molecular mechanism underlying this t-LTD at the two synapses.

      Strengths:

      The two synapses analyzed in this study have been understudied. This new data thus provides interesting new information on a plasticity process at these synapses, and the authors demonstrate subtle differences in the underlying molecular mechanisms at play. Experiments are in general well controlled and provide robust data that are properly interpreted.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      • Caution should be taken in the interpretation of the results to extrapolate to adult brain as the data were obtained in P13-21 days old mice, a period during which synapses are still maturing and highly plastic.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. We indicate that in the methods, results, and discussion (where we discuss that in some detail) sections.

      • In experiments where the drug FK506 or thapsigargin are loaded intracellularly, the concentrations used are as high as for extracellular application. Could there be an error of interpretation when stating that the targeted actors are necessarily in the post-synaptic neuron? Is it not possible for the drug to diffuse out of the cell as it is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compounds cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, and as suggested, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM), and have obtained the same results. These data are now included in the figure 3 and in the text.

      • The experiments implicating glutamate release from astrocytes in t-LTD would require additional controls to better support the conclusions made by the authors. As the data stand, it is not clear, how the authors identified astrocytes to load BAPTA and if dnSNARE expression in astrocytes does not indirectly perturb glutamate release in neurons.

      We thank the reviewer for rising this point. We now indicate how astrocytes have been identified to load BAPTA. We reply to this in detail in the “Recommendations for the authors” from reviewer 2.

      Significance:

      While this is the first report of t-LTD at these synapses, this plasticity process has been mechanistically well investigated at other synapses in the hippocampus and in the cortex. Nevertheless, this new data suggests that mechanistic differences in the induction of t-LTD at these two DG synapses could contribute to the differences in the physiological influence of the LPP and MPP pathways.

      Reviewer #3 (Public Review):

      Coatl et al. investigated the mechanisms of synaptic plasticity of two important hippocampal synapses, the excitatory afferents from lateral and medial perforant pathways (LPP and MPP, respectively) of the entorhinal cortex (EC) connecting to granule cells of the hippocampal dentate gyrus (DG). They find that these two different EC-DG synaptic connections in mice show a presynaptically expressed form of long-term depression (LTD) requiring postsynaptic calcium, eCB synthesis, CB1R activation, astrocyte activity, and metabotropic glutamate receptor activation. Interestingly, LTD at MPP-GC synapses requires ionotropic NMDAR activation whereas LTD at LPP-GC synapse is NMDAR independent. Thus, they discovered two novel forms of t-LTD that require astrocytes at EC-GC synapses. Although plasticity of EC-DG granule cell (GC) synapses has been studied using classical protocols, These are the first analysis of the synaptic plasticity induced by spike timing dependent protocols at these synapses. Interestingly, the data also indicate that t-LTD at each type of synapse require different group I mGluRs, with LPP-GC synapses dependent on mGluR5 and MPP-GC t-LTD requiring mGluR1.

      The authors performed a detailed analysis of the coefficient of variation of the EPSP slopes, miniature responses and different approaches (failure rate, PPRs, CV, and mEPSP frequency and amplitude analysis) they demonstrate a decrease in the probability of neurotransmitter release and a presynaptic locus for these two forms of LTD at both types of synapses. By using elegant electrophysiological experiments and taking advantage of the conditional dominant-negative (dn) SNARE mice in which doxycycline administration blocks exocytosis and impairs vesicle release by astrocytes, they demonstrate that both LTD forms require the release of gliotransmitters from astrocytes. These data add in an interesting way to the ongoing discussion on whether LTD induced by STDP participates in refining synapses potentially weakening excitatory synapses under the control of different astrocytic networks. The conclusions of this paper are mostly well supported by data, but some aspects the results must be clarified and extended.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      (1) It should be clarified whether present results are obtained with or without the functional inhibitory synapse activation. It is not clear if GABAergic synapses are blocked or not. If GABAergic synapses are not blocked authors must discuss whether the LTD of the EPSPs is due to a decrease in glutamatergic receptor activation or an increase in GABAergic receptor activation. Moreover, it should be recommended to analyze not only the EPSPs but also the EPSCs to address whether the decrease in synaptic transmission is caused by a decrease in the input resistance or by a decrease in the space constant (lambda).

      We thank the reviewer for rising these points. GABAergic inhibition was not blocked in our experiments. The observed forms of t-LTD seem to be due to a decrease in glutamate release probability as indicated in the manuscript, mediated by the mechanism we uncover and describe here. To determine and clarify whether GABA receptors have any role in these forms of t-LTD, we repeated the experiments in the presence of the GABAA and GABAB receptors antagonists bicuculline and SCH50911, respectively. Blocking GABA receptors do not prevent or affect t-LTD at LPP- or MPP-GC synapses, that is still present and with a similar magnitude that controls. These results indicating that these receptors are not involved in these forms of t-LTD. These results are now included in the text in the results section (page 8) and as a new figure S1. In our experiments, no changes in input resistance or space constant were observed, and importantly, no changes were observed in the amplitude/slopes of EPSP in the control pathway that does not undergo plasticity protocol that we routinely use in our experiments.

      (2) Authors show that Thapsigargin loaded in the postsynaptic neuron prevents the induction of LTD at both synapses. Analyzing the effects of blocking postsynaptic IP3Rs (Heparin in the patch pipette) and Ryanodine receptors (Ruthenium red in the patch pipette) is recommended for a deeper analysis of the mechanism implicated in the induction of this novel forms of LTD in the hippocampus.

      We thank the reviewer for this suggestion. We repeated the experiments loading the postsynaptic cell with heparin and ruthenium red using the path pipette. In these experimental conditions, we observed that t-LTD was not affected by the heparin treatment (discharging a role of IP3Rs), but that it was prevented by the ruthenium red treatment (indicating the requirement of ryanodine receptors). We include now this data in the text (page 12) and in the Figure 3a, b, e, f.

      (3) Authors nicely demonstrate that CB1R activation is required in these forms of LTD by blocking CB1Rs with AM251, however an interesting unanswered question is whether CB1R activation is sufficient to induce this synaptic plasticity. This reviewer suggests studying whether applying puffs of the CB1R agonist, WIN 55,212-2, could induce these forms of LTD.

      We thank the reviewer for this suggestion. We repeated the experiments adding WIN55, 212-2 as suggested.  The activation of CB1R by puffs of the agonist WIN 55, 212-2 to the astrocyte, directly induced LTD at both LPP- and MPP-GC synapses. We include now this data in the text (page 14) and in the Figure 3c, d, g, h.

      (4) Finally, adding a last figure with a cartoon summarizing the proposed model of action in these novel forms of LTD would add a positive value and would help the reading of the manuscript, especially in those aspects related with the discussion of the results.

      We thank the reviewer for the suggestion. We include now a figure showing the proposed mechanisms (Figure 5).

      The extension of these results would improve the manuscript, which provides interesting results showing two novel forms of presynaptic t-LTD in the brain synapses with different action mechanisms probably implicated in the different aspects of information processing.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are just a few aspects that could be clarified to bolster the authors' conclusions.

      The author centered the conclusion of their study on the role of astrocytic activity in regulating these two forms of plasticity (see title). To strengthen the evidence that astrocytes are key regulators of t-LTD at MPP and LPP GC synapses by regulating SNARE protein-dependent glutamate release, additional complementary approaches should be considered, such as other mouse models enabling the control of SNARE-dependent exocytosis and/or optogenetic/chemogenetic tools to selectively manipulate astrocytes during the induction of t-LTD, thereby directly assessing the impact of astrocytic activity on synaptic plasticity. Implementing calcium imaging or glutamate sensors to visualize the dynamics of astrocytic calcium signaling and glutamate release during t-LTD could be also considered.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocytes participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, pages 14 and 15 and in figure 4.

      • How were astrocytes identified to be loaded with BAPTA? The author should clarify this methodological aspect and provide confocal images of patched astrocytes situated 50-100 um from the recorded neuron.

      We thank the reviewer for the comment. We include now this information in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      • Please provide confocal images of EGFP expression in the DG astrocytes of dnSNARE mice both on and off Dox, to verify transgene expression in astrocytes

      We thank the reviewer for this suggestion. We now include an image of GFP expression in the DG astrocytes of off Dox dnSNARE mice. We did not provide the animals with doxycycline since birth and thus the gene was constantly expressed. We now show this image in Fig. S3. All the pups and mice are not DOX fed, meaning that the transgenes are continuously being expressed and therefore the exocytosis should be blocked in astrocytes.

      Minor points:

      Lines 250-253: It is mentioned that TTX is added at baseline, washed out for the t-LTD experiment, and then reapplied post t-LTD. I suggest clarifying the timing and rationale for this application for a broad audience.

      We thank the reviewer for the suggestion. We now include some information related to the timing and rationale of the experiment phases (page 9).

      The discussion is quite detailed and provides a comprehensive overview of the study's findings. To enhance clarity and impact, the authors might consider to,

      • add subheadings and bullet points for key findings. This will improve readability.

      • this section could benefit from streamlining to avoid redundancy.

      • some sentences could be made more concise without losing meaning.

      We thank the reviewer for these suggestions. We now include subheadings in the discussion section to improve readability and have made some sentences more concise and simple without losing meaning.

      In figure legends, consistency with capitalization should be maintained, for example in the statistical significance notation, ***P < 0.001" or ***p < 0.001")

      We now include p<0.001 in the figure legend 4 for consistency.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      • All results were obtained in young still quite immature synapses. To strengthen the significance of the findings, the authors could repeat some of the main experiments in adult mice (8 weeks and beyond). If not, they should state clearly that these mechanisms were only evidenced in early post-natal conditions.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. As the reviewer suggests, we indicate that in the methods (page 5), results (page 8), and discussion (page 19) (where we discuss that in some detail) sections.

      • Lines 246-249 and fig 1f,p: Authors need to perform a statistical test on these two graphs to support their claim that 'A plot of CV-2 versus the change in the mean evoked EPSP 246 slope (M) before and after t-LTD mainly yielded points below the diagonal line at LPP-GC and MPP-GC synapses'.

      That could not be clear in the previous version. We observed an error in the points (with some points missing) of one of the graphs that we have corrected. In addition, and as suggested by the reviewer we performed a regression analysis that confirms the conclusions stated. This is now included in the text (page 9). Thus, we have added information about mean values ± SEM in the text and the linear regression of the data for LPP-GC (Mean = 0.607 ± 0.054 vs 1/CV2 = 0.439 ± 0.096, R2 = 0.337; n = 14) and MPP-GC synapses (Mean = 0.596 ± 0.056 vs 1/CV2 = 0.461 ± 0.090, R2 = 0.168; n = 13), respectively. Data yielded on the dotted horizontal line, 1/CV2 = 1, indicates no change in the probability of release, in contrast, data yielded below the dotted diagonal line is suggestive of a change in the probability of release parameters (for review, see Brock et al., 2020, Front Synaptic Neurosci 12, 11).

      • We are not sure that the experiment with the MK801 provided in the patch pipet can be interpreted correctly (Figure 2 a,b and e,f). How sure are the authors that, when applying MK801 in the patch pipet, it can reach its binding site within the pore? The concentration of MK801 is also very high (500 microM) and used at the same concentration extracellularly and intracellularly. Why did the authors not use lower concentration when applied intracellularly?

      We thank the reviewer for rising this point. MK801 in the pipette is reaching the pore when loaded postsynaptically as when we record NMDA currents from postsynaptic neurons loaded with MK801, these currents are blocked. We include now a control experiment showing the effect of postsynaptic MK801 on NMDA current in the text (page 10). NMDA currents has been recorded at +40 mV, blocking AMPAR and GABAR with NBQX and bicuculline. Related to the concentration, it has been described that the affinity from the internal site is much lower (several orders of magnitude) than from the extracellular side(Sun et al., 2018 Neuropharmacology, 143, 122-129) and the concentrations used have been extensively used in previous studies. It is clear that the concentrations used in the present work blocked NMDAR currents but did not prevent LTD.

      • Linked to the point above, for the intracellular application of FK506 and thapsigargin, the concentrations used extracellularly and intracellularly are identical. The authors could have used lower concentrations for the intracellular application. Also, how can they be sure of the correct interpretation of these data as the drug essentially reaching a post-synaptic target when applied intracellularly? If the drug can enter the neuron, why could it not diffuse out of the neuron especially when loaded at a high concentration? Maybe using a lower concentration when applied intracellularly could at least partially address this issue.

      It is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compound cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where it will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM) and have obtained the same results. These data are now included in the figure 3 and the numbers in the text have been updated (pages 12-13).

      • The data supporting the possibility of glutamate release by astrocytes as a main source of glutamate to promote t-LTD needs to be strengthened. In experiment Figure a-h, it is not clear how the authors recognize astrocytes to patch. No details are provided in the methods or in the main text. If we understand correctly, it is only by performing a current steps protocol to ensure that the patched cell did not produce action potentials. If this was the case, the authors need to be more specific and provide details of this protocol. More importantly, the one trace that was provided in Figures 4a and 4f suggests, albeit by a rough estimation that we made with a ruler, that the highest current step only depolarized the cell to about -40 mV. This is not sufficient to ensure that the recorded cell is not a neuron. The authors should increase their steps to high depolarizing currents to ensure that the patched cell is not a neuron. Better yet, they should load the cell with an dye to process the slice after the electrophysiological recording for immunohistochemistry to ensure that it was indeed an astrocyte. Alternatively, they can try to aspirate the cell content at the end of the recording to perform a qPCR for astrocyte markers eg. GFAP.

      We thank the reviewer for the comment. We include now information regarding how astrocytes were identified (also raised by reviewer 1) in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, eGFP fluorescence (astrocytes from dnSNARE mice), and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      We agree with the reviewer that in figure 4a and 4f, the step protocol might not be completely clear. For this, we revised that and now include in a clearer way that we applied pulses that depolarized astrocytes beyond -20 mV, with no action potentials found at any point. We also include now this in figure S3.

      • Related to the point above, the use of the model expressing dnSNARE in astrocytes is elegant. Yet, to really interpret the data obtained in these slices as a lack of vesicle release (and most importantly glutamate) we think that the authors should ensure that glutamate release from nearby neurons is not impacted. They could patch nearby neurons in dnSNARE slices and test PPR or synaptic fatigue when stimulating either the LPP or MPP. The authors should avoid overinterpretation of these results. As it stands, it is not evident that dnSNARE expression does not perturb other mechanisms within the astrocyte that in turn perturb pre-synaptic glutamate release. Adding back glutamate as puffs does not help to disentangle this issue.

      To gain more insight into the fact that glutamate is released by astrocytes we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, as indicated above, t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This is included in the text (page 15) and in figure 4d,e, i, j.

      In addition, we loaded astrocytes with the light chain of the tetanus toxin (TeTxLC) which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. These data indicate that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, page 14 and in figure 4.

      Minor points:

      • line 107, did the authors mean t-LTP and t-LTD? we don't understand STDP mentioned here.

      We meant to say t-LTP. This is now corrected.

      • line 108: should STDP be replaced by t-LTD as the authors only focused on this plasticity mechanism.

      We agree, we indicate now t-LTD.

      • line 131-132 : it is not clear when the animals were fed with doxycycline. If it was from birth, then the 'not' should be removed. Otherwise the authors should clearly state when the doxycyline was provided.

      DOX was not provided and that means that the transgene was continuously expressed and therefore the exocytosis should be blocked in astrocytes. We express that clearer in page 5, methods section.

      • line 223 : which hippocampal synapses? needs to be stated

      As suggested this is now included in the text as for cortical synapses. Synapses are Schaffer collaterals SC-CA1 for hippocampus and layer L4-L2/3 for cortical synapses (page 8).

      • line 273: what do the authors mean when writing 'from'? We don't understand the data provided on this line.

      We thank the reviewer for noticing this. That refers to the amplitude of NMDAR-mediated currents average before and after D-AP5 or MK801. We express this now in a clearer way (page 10, from 57±8 pA to 6±5 pA).

      • line 286 : why do the authors point out work on GluN2B and GluN3A only here when they first investigate GluN2A contribution to t-LTD? what about previous data on GluN2A?

      We have now expressed this in a different way to make it clear. We wanted to indicate that the available data for presynaptic NMDAR at MPP-GC synapses has been indicated to contain GluN2B and GluN3A subunits and to our knowledge, no data indicate that they contain GluN2A subunits.

      • line 428 : what do the authors mean by 'not least' ?

      This is a typo and we have removed that from the text.

      Reviewer #3 (Recommendations For The Authors):

      My only suggestion for improving data presentation in the manuscript would be to split some figures of the paper. In my opinion, the figures are too dense and therefore difficult to follow for the broad audience of eLife readers. In addition, a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes would significantly improve the presentation of Figure 1.

      We thank the reviewer for the suggestion, but we would prefer to let the figures as they are organized, as while we agree in some cases they are a bit big, in this way it is easier to compare lateral and medial pathways. For this, it could be better to let information regarding the two pathways in the same figure. Nevertheless, we try now to make figures clearer to use a columnar organization of the figures for each pathway what we think, would make easier to compare pathways. As the reviewer suggests we include now a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes in Figure 1, that we agree will improve the presentation of this figure and thank the reviewer for the suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers found this manuscript to present convincing evidence for associative and non-associative behaviors elicited in male and female mice during a serial compound stimulus Pavlovian fear conditioning task. The work adds to ongoing efforts to identify multifaceted behaviors that reflect learning in classic paradigms and will be valuable to others in the field. The reviewers do note areas that would benefit from additional discussion and some minor gaps in data reporting that could be filled by additional analyses or experiments.

      We thank the reviewers and the editors for their thoughtful and constructive critiques of our manuscript. We have updated our manuscript with data from additional experiments as suggested by the reviewers, and we have significantly edited the text and figures to reflect these additions. Our detailed, point-by-point responses are below.

      Reviewer #1 (Public Review):

      The main goal of the study was to tease apart the associative and non-associative elements of cued fear conditioning that could influence which defensive behaviors are expressed. To do this, the authors compared groups conditioned with paired, unpaired, or shock only procedures followed by extinction of the cue. The cue used in the study was not typical; serial presentation of a tone followed by a white noise was used in order to assess switches in behavior across the transition from tone to white noise. Many defensive behaviors beyond the typical freezing assessments were measured, and both male and female mice were included throughout. The authors found changes in behavioral transitions from freezing to flight during conditioning as the tone transitioned into white noise, and a switch in freezing during extinction such that it became high during the white noise as flight behavior decreased. Overall, this was an interesting analysis of transitions in defensive behaviors to a serially presented cue consisting of two auditory stimuli during conditioning and then extinction.

      We thank the Reviewer for their supportive insight.

      There are some concerns regarding the possibility that the white noise is more innately aversive than the tone, inducing more escape-like behaviors compared to a tone, especially since the shock only group also showed increased escape-like behaviors during the white noise versus tone. This issue would have been resolved by adding a control group where the order of the auditory stimuli was reversed (white noise->tone).

      We appreciate this concern, and we have added two additional groups to address this possibility. We have conducted the same experimental paradigm with 2 reverse-SCS groups (WN—tone), one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020).

      While the more complete assessment of defensive behaviors beyond freezing is welcomed, the main conclusions in the discussion are overly focused on the paired group and the associative elements of conditioning, which would likely not be surprising to the field. If the goal, as indicated in the title, was to tease apart the associative and non-associative elements of conditioning and defensive behaviors, there needs to be a more emphasized discussion and explicit identification of the non-associative findings of their study, as this would be more impactful to the field.

      We have rewritten the Discussion to provide a greater emphasis on the findings of the study that are more related to non-associative mechanisms. For example, we argue that cue-salience and changes in stimulus intensity can induce non-associative increases in locomotor behavior and tail rattling in shock-sensitized mice.

      Reviewer #2 (Public Review):

      Summary:

      The authors examined several defensive responses elicited during Pavlovian conditioning using a serial compound stimulus (SCS) as the conditioned stimulus (CS) and a shock unconditioned stimulus (US) in male and female mice. The SCS consisted of tone pips followed by white noise. Their design included 3 treatment groups that were either exposed to the CS and US in a paired fashion, in an unpaired fashion, or only exposed to the shock US. They compared freezing, jumping, darting, and tail rattling across all groups during conditioning and extinction. During conditioning, strong freezing responses to the tone pips followed by strong jumping and darting responses to the white noise were present in the paired group but less robust or not present in the unpaired or shock only groups. During extinction, tone-induced freezing diminished while the jumping was replaced by freezing and darting in the paired group. Together, these findings support the idea that associative pairings are necessary for conditioned defensive responses.

      Strengths:

      The study has strong control groups including a group that receives the same stimuli in an unpaired fashion and another control group that only receives the shock US and no CS to test the associative value of the SCS to the US. The authors examine a wide variety of defensive behaviors that emerge during conditioning and shift throughout extinction: in addition to the standard freezing response, jumping, darting, and tail rattling were also measured.

      We thank the Reviewer for their supportive appraisal of this study’s strengths.

      Weaknesses:

      This study could have greater impact and significance if additional conditions were added (e.g., using other stimuli of differing salience during the SCS), and determining the neural correlates or brain regions that are differentially recruited during different phases of the task across the different groups.

      In the revised manuscript, we have conducted experiments with 2 reverse-SCS groups (WN—tone): one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024).

      We agree that determining the neuronal correlates and brain regions that are involved in defensive ethograms at various stages within this paradigm is of great importance, but we feel that those experiments are beyond the scope of the current study, which is focused on identifying behavioral differences based on associative and non-associative factors.

      Reviewer #1 (Recommendations For The Authors):

      In LINES 72-73, authors say they used a "truly random procedure" as one of their control groups. Then in LINES 113-116, they describe this group as "unpaired" where the "SCS could not reliably predict footshock". Combined, it is unclear if this group is random or unpaired. The "truly random procedure" is defined, by the cited Rescorla paper, as "the two events are programmed entirely randomly and independently in such a way that some "pairings" of CS and US may occur by chance alone". So, truly random would indicate that the shock may occur during the cue, while unpaired indicates the shock was explicitly unpaired from the cue. If the authors used a random procedure, the groups need to be labeled as random, not unpaired, and the # of cues that happened to coincide with footshock per animal needs to be reported somewhere. If the authors used an unpaired procedure (which appears to be the case based on 40-60s ITI between SCS and footshock being reported), it needs to be clearer and consistent throughout that it was explicitly unpaired, as well as removing the claim in LINE 72-73 that they used a "truly random procedure".

      We did indeed use an explicitly unpaired procedure. We have adjusted the text and figures to better reflect this, and we removed any mentions of randomness with regards to the presentations of SCS and footshock.

      Despite the lack of significant sex differences, it would still be helpful if data panels with individual data points (e.g. Fig 2E-J), were presented as identifiable by sex (e.g. closed vs open circles for males vs females).

      The revised manuscript now compares four or five groups per figure, making data presentation complicated. Providing the individual data points in each panel reduces figure clarity, therefore, we feel it is best to present the data as box-and-whisker plots without them. However, the source data files for each figure are available to the reader and the data are clearly labeled to be identifiable by sex.

      Is it not odd that all groups showed similar levels of contextual freezing during the 3min baseline? If shocks are unsignaled in the UN and SO groups, one would expect higher levels of contextual freezing compared to a paired group.

      We are not certain why one would expect higher levels of contextual freezing in the UN and SO groups compared to the PA group at the beginning of conditioning day 2. Another study also looked at baseline freezing in a contextual fear group (which is the same as shock only in our study) and in an auditory cued fear conditioning group within the conditioning context, and their data show that freezing during the baseline period is equivalent between groups (Sachella et al., 2022).

      During baseline on Extinction Day 1, it does seem that the unpaired and SO groups tend to have higher freezing levels compared to the paired groups. Author response image 1 shows baseline freezing during the first 3 minutes of extinction day 1. After two days of conditioning in the conditioned flight paradigm, contextual freezing either is, or trends to be significantly higher in the UN, UN-R, and SO groups than the PA and PA-R groups.

      Author response image 1.

      Baseline Freezing levels for all groups during the first extinction session. Baseline period is defined as the first 180 seconds of the session, before any auditory stimulus was presented. PA, Paired; UN, Unpaired; SO, Shock Only; PA-R, Paired Reverse; UN-R, Unpaired Reverse. *p<0.05, **p<0.01, ****p<0.0001.

      Do the tone and WN elicit similar levels of defensive behaviors in a naïve mouse? Or have the authors tested WN followed by tone? Is there a potential issue that the WN may be innately aversive which is then amplified with training? i.e. does a tone preferentially induce freezing while WN induces active behaviors, regardless of which sensory stimulus is temporally closer to the shock? If the change in behavior is really due to the pairing and temporal proximity to shock, then there should be increased jumps, etc to the tone if trained with WN->tone.

      WN can indeed be used as an aversive stimulus under certain conditions and at sufficiently high decibel levels. In the conditioned flight paradigm, WN is presented at 75dB, which is below the threshold for eliciting an acoustic startle response in a C57BL/6J mouse (Fadok et al. 2009). Also, during pre-exposure, when animals are naïve to the SCS, tone and WN stimuli do not elicit defensive behaviors (see Fadok et al. 2017, Borkar et al. 2020, 2024).

      As suggested by the Reviewer, during revision we have included reverse-SCS paired (PA-R) and unpaired (UN-R) groups to test for the role of stimulus salience and stimulus order on defensive ethograms. During conditioning day 2, the PA-R group exhibited little freezing to the WN, with a slightly elevated activity index, and they exhibited robust freezing during tone (revised Figure 2A-H). The activity during the WN in the PA-R group was significantly lower than that of the PA group (Figure 2L). The PA-R group also did not respond to WN with escape jumps or darting (Figure 3I, 4G). The UN-R group displayed greater activity during the WN than the UN and PA-R groups, but less activity than the PA group (Figure 2D, H). The UN-R group did not dart but this group displayed some jumping at WN onset (Figure 3H), like what was observed in the UN group.

      These data suggest that WN has inherent, salient properties that can induce some non-associative activity after the mouse has been sensitized by shock (see also Hersman et al. 2020 for more detailed analysis of stimulus salience in the conditioned flight paradigm). However, only in the PA group is robust flight behavior (comprised of high numbers of escape jumps and darting) observed. Therefore, both stimulus salience and temporal order are important for eliciting transitions from freezing to flight.

      Fig 3G/4G are hard for me to understand. The figure legends say they're survival graphs but the y-axis labels "Latency to initial jump/dart (% of cohort)" confuses me. What is the purpose of these graphs? Perhaps they are not needed. Or consider presenting them similar to Fig 7C, D as those were more intuitive and faster for me to grasp.

      We had intended these plots to show that a greater proportion of the paired group jumps and darts during WN compared to the unpaired group, and that the percentage of the cohort that jumps and darts increases across conditioning trials. Because these graphs were not clear, we have removed them, and we have replaced them with graphs comparing total cohort percentages that jumped (Figure 3I) or darted (Figure 4G) over the whole CD2 session.

      For the extinction data, I did not see within group analyses for within or between session fear extinction to the tone. So, for the paired group, were the last 4 trials of Ext 1 significantly lower than the first 4 trials? If not, then they did not show within-session extinction. Also, for the paired group, were the last 4 trials of Ext 1 significantly different than the first 4 trials of Ext 2? This would test for long-term retention and spontaneous recovery.

      In the original submission and in the revised manuscript, we calculated a delta change score for freezing during tone in the early versus late blocks of 4 trials, and then we statistically compared these differences across groups (Figure 5C, D). This allowed us to assess between-group differences in changes to tone-evoked freezing during extinction. Freezing to tone did decrease significantly over the first extinction session for the paired group (Early Ext1 vs Late Ext1, paired t-test, t(31) \= 6.23, p<0.0001), and when comparing late Ext1 and early Ext2, we found that tone-evoked freezing did significantly increase (Late Ext1 vs Early Ext2, paired t-test, t(31) \= 5.26, p<0.0001). This increase in cue-induced freezing between days of extinction is characteristic of C57BL/6J mice (Hefner et al., 2008). Our study did not test for more distal timepoints, so we cannot comment on the efficacy of long-term retention or spontaneous recovery.

      For the conditioning and extinction data across Figs 2, 5 and 6, what I gather from them is that freezing is high to the tone and low to the WN during conditioning, and then low to the tone, and high to the WN across extinction. Then for activity levels I see they are low to the tone and high to the WN during conditioning, and then low to the WN during extinction. The piece that is missing is what are activity levels like to the tone during extinction. Are they low like in conditioning and remain low in extinction? Or do they increase across extinction as freezing decreases? As I was going through these graphs I drew myself out step function summaries of the freezing and activity levels between tone/WN for conditioning vs extinction; maybe the authors could consider a summary figure.

      We thank the Reviewer for their interest. We found that within the paired group, activity to tone remained low throughout both days of extinction (though increased within each session) and did not return to normal activity levels. We present this data in Author response image 2. We thank the Reviewer for the suggestion of a summary figure, but we feel there are too many axes of classification (between-group, within-group, multiple behaviors, tone/WN, conditioning/extinction) to coherently present our findings in a single figure.

      Author response image 2.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the PA group. SCS, Serial compound stimulus; Ext, extinction; PA, Paired.

      In the discussion (LINE 592-3), they discuss that shock sensitization in the SO group may prime a stressed animal to dart more readily to WN upon stimulus transition. Should this not also happen during the transition of silence to tone? What is special about a transition between two auditory stimuli that would result in panic like behavior in an animal that only received shock presentations? This also gets back to an earlier concern above regarding the potentially innately aversiveness of the WN.

      After 2 days of shock sensitization, we observe that mice exhibit freezing to the tone during the first three trials of extinction day 1 (Figure 5A). This non-associative freezing response is like that observed in other studies of non-associative fear processing (please see Kamprath and Wotjak, 2004). As trials progress during extinction day 1, mice do become mildly activated during the tone (Author response image 3). The transition to WN in the shock-only group during extinction induces non-associative darting responses, but it does not induce escape jumping behavior (Figure 7).  We hypothesize that the innate salience of the WN is a vital factor contributing to these escalated responses. The importance of stimulus salience in conditioned flight was also demonstrated by Hersman et al., 2020 for SCS conditioning, and by Furuyama et al., 2023 for single tone conditioning.  Just as with conditional freezing responses (Kamprath and Wotjak, 2004), we believe that conditional flight is controlled by summative components, one being associative and the other non-associative.

      Author response image 3.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the SO group. SCS, Serial compound stimulus; Ext, extinction; SO, Shock Only.

      In the discussion (LINE 583), they say that the development of explosive defensive behaviors are "not achievable with traditional single-cue Pavlovian conditioning paradigms". The authors should include a caveat here that the current study did not compare their results to a group of mice that received just WN-shock pairings.

      We thank the reviewer for this comment. This statement was meant to highlight that traditional paradigms do not offer an element of signaling the temporal imminence of threat, only its inevitability. It was not our intention to state that defensive escape behaviors were unachievable in single-cue conditioning paradigms, and we regret not making this clear. Indeed, the supplement of Fadok et al. 2017 shows that WN-shock conditioning is capable of inducing flight, Furuyama et al. 2023 shows that tone-shock conditioning is capable of inducing flight under specific parameters, and Gruene et al. 2015 demonstrates that single CS-US pairings induce conditional darting behaviors in female rats. We have adjusted the text to better reflect our intent.  

      Minor comment to LINE 613-5: Speaking as someone who has done fear conditioning in both mice and rats, tail rattling may be specific to mice (I have seen this often) and likely not observable in rats (never seen it).

      We thank the Reviewer for this information. We have adjusted our text to mainly discuss mouse-specific tail rattling.

      Reviewer #2 (Recommendations For The Authors):

      The research questions in this study are novel and bring new insight to the field. However, there are some issues that can be addressed to improve the overall quality of the study, namely, the reader is left wanting to know more, especially about how neural circuits contribute to these different defensive behaviors during this task. Below are some recommendations for the authors that would greatly improve the impact and significance of this study.

      (1) What are the neural correlates or circuits recruited during these different defensive behaviors across the course of conditioning and extinction? How might they differ between the PA and UN groups? What differences might emerge when an animal is shifting their defensive behavior from freezing to darting, for example? Answering these questions would require intensive additional experiments, therefore more discussion of possible neural mechanisms that might be recruited during this task would be appreciated, given the scope of the subject area.

      We agree that understanding the neural circuits recruited during these behaviors and across conditioning and extinction is of vital importance. We are actively working on these questions, and we have published on the role of central amygdala circuits (Fadok et al. 2017) as well as on top-down control of flight by the medial prefrontal cortex (Borkar et al. 2024). Because the current manuscript is focused on learning mechanisms influencing defensive behavior, we would prefer to focus our discussion on that, rather than speculating on possible neural mechanisms. However, we have added a statement in the Discussion (LINES 706-707) emphasizing that future studies should investigate the neuronal mechanisms contributing to threat associations and different defensive behaviors.

      (2) Were any vocalizations observed during conditioning or extinction phases? If not, could you speculate how type and occurrence of vocalizations might correlate with the different defensive responses observed?

      Audible vocalizations were only observed during footshock presentations (squeaks). Unfortunately, we do not have the proper specialized recording equipment to monitor the full spectrum of mouse vocalizations, especially those in the ultrasonic range. Thus, we cannot speculate on the nuances of vocalizations in mice with respect to this behavioral paradigm. To the best of our knowledge, mice have not been reported to emit specific ultrasonic calls during conditioned threat like those of rats. That said, it would be of interest to determine if mice emit different vocalizations during different defensive behaviors.

      (3) The transition from freezing to flight during the SCS is thought to be due to the close proximity of threat imminence between the WN CS and shock US. What if you switched the order of the SCS stimuli to WN followed by tone stimuli? If the salience of the WN stimulus is truly driving the jumping behavior, then it would be observed even if the WN stimulus preceded the pure tone stimulus and that would bring additional evidence that it is the associative value of the stimuli rather than its salience that's driving the defensive behaviors. What do you predict you would observe in rodents that were given a WN-tone SCS paired and unpaired in the same design of this study?

      As suggested by the reviewer, we collected data from reverse-SCS paired and unpaired groups and reported our findings within the manuscript. Our detailed findings are also discussed above. Overall, we find that a combination of stimulus salience and temporal proximity, and a summation of non-associative and associative mechanisms, are necessary to elicit explosive flight behavior (escape jumping and darting).

      References

      Borkar CD, Dorofeikova M, Le QE, Vutukuri R, Vo C, Hereford D, Resendez A, Basavanhalli S, Sifnugel N, Fadok JP (2020) Sex differences in behavioral responses during a conditioned flight paradigm. Behavioural Brain Research 389:112623.

      Borkar CD, Stelly CE, Fu X, Dorofeikova M, Le QE, Vutukuri R, Vo C, Walker A, Basavanhalli S, Duong A, Bean E, Resendez A, Parker JG, Tasker JG, Fadok JP (2024) Top-down control of flight by a non-canonical cortico-amygdala pathway. Nature 625: 743-749.

      Fadok JP, Krabbe S, Markovic M, Courtin J, Xu C, Massi L, Botta P, Bylund K, Müller C, Kovacevic A, Tovote P, Lüthi A (2017) A competitive inhibitory circuit for selection of active and passive fear response. Nature 542:96-100.

      Furuyama T, Imayoshi A, Iyobe T, Ono M, Ishikawa T, Ozaki N, Kato N, Yamamoto R (2023) Multiple factors contribute to flight behaviors during fear conditioning. Scientific Reports 13:10402. 

      Gruene TM, Flick K, Stefano A, Shea SD, Shansky RM (2015) Sexually divergent expression of active and passive conditioned fear responses in rats. eLIfe 4:e11352.

      Hefner K, Whittle N, Juhasz J, Norcross M, Karlsson RM, Saksida LM, Bussey TJ, Singewald N, Holmes A (2008) Impaired Fear Extinction Learning and Cortico-Amygdala Circuit Abnormalities in a Common Genetic Mouse Strain. Journal of Neuroscience 6:8074-8085.

      Hersman S, Allen D, Hashimoto M, Brito SI, Anthony T (2020) Stimulus salience determines defensive behaviors elicited by aversively conditioned serial compound auditory stimuli. elife 9:e53803. 

      Kamprath K and Wotjak CT (2004) Nonassociative learning processes determine expression and extinction of conditioned fear in mice. Learning & Memory 11:770-786.

      Sachella TE, Ihidoype MR, Proulx CD, Pafundo DE, Medina JH, Mendez P & Piriz J (2022) A novel role for the lateral habenula in fear learning. Neuropsychopharmacology 47:1210-1219.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the Reviewer for all their effort and suggestions over multiple drafts. Their comments have encouraged us to read and think more deeply about the issue under discussion (BLA spiking in response to CS/US inputs), and to find the papers whose contents we think provide a potential solution. We agree that there is more to understand about the mechanisms underlying associative learning in the BLA. We offer our paper as providing a new way of understanding the role of circuit dynamics (rhythms) in guiding associative learning via STDP. As we pointed out in our response to the previous review, the issue highlighted by the Reviewer is an issue for the entire field of associative learning in BLA: our discussion of the issue suggests why the experimentally observed BLA spiking in response to CS inputs, performed in the absence of US inputs (as done in the papers cited by the Reviewer), may not be what occurs in the presence of the US. Since our explanation involves the role of neuromodulators, such as ACh and dopamine, the suggestion is open to further testing.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Public Review’s only objection: “Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.”

      Recommendations for the Authors: “The authors have successfully addressed most of my concerns. I commend them for their thorough response. The one nagging issue is the unrealistic activation used to drive CS and US activation in their network. While I agree that their stimulus parameters are consistent with a contextual fear task, or one that uses an olfactory CS, this was not the focus of their study as originally conceived. Moreover, the types of activation observed in response to auditory cues, which is the focus of their study, do not follow what is reported experimentally. Thus, I stand by the critique that the proposed mechanism has not been demonstrated to work for the conditioning task which the authors sought to emulate (Krabbe et al. 2019). Frustratingly, addressing this is simple: run the model with ECS neurons driven so that they fire bursts of action potentials every ~1 sec for 30 sec, and with the US activation noncontiguous with that. If the model does not produce plasticity in this case, then it suggests that the mechanisms embedded in the model are not sufficient, and more work is needed to identify them. While 'memory' effects are possible that could extend the temporal contiguity of the CS and US, the authors need to provide experimental evidence for this occurring in the BLA under similar conditions if they want to invoke it in their model. 

      (1) Fair response. I accept the authors arguments and changes. 

      (2) The authors rightly point out that the simulated afferents need not perfectly match the time courses of the peripheral inputs, since what the amygdala receives them indirectly via the thalamus, cortex, etc. However, it is known how amygdala neurons respond to such stimuli, so it behooves the authors to incorporate that fact into their model. 

      Quirk et al. 1997 show that the response to the tone plummets after the first 100 ms in Figs 5A and 6B. The Herry et al. 2007 paper emphasizes the transient response to tone pips, with spiking falling back to a poisson low firing rate baseline outside of the time when the pip is delivered. 

      Regarding potential metabotropic glutamate activation, the stimulus in Whittington et al. 1995 was electrical stimulation at 100 Hz that would synchronously activate a large volume of tissue, which is far outside the physiological norm. I appreciate that metabotropic glutamate receptors may play a role here, but ultimately the model depends upon spiking activity for the plastic process to occur, and to the best of my knowledge the spiking activity in BLA in response to a sustained, unconditioned tone, is brief (see also Quirk, Repa, and Ledoux 1995). Perhaps a better justification for the authors would be Bordi and Ledoux 1992, which found that 18% of auditory responsive neurons showed a 'sustained' response, but the sustained response neurons appear to show much weaker responses than those with transient ones (Fig 2).  I am willing to say that their paper IS relevant to contextual fear, but that is not what the authors set out to do. 

      (3) Fair response. 

      (4) Very good response! 

      Minor points: All points were addressed.”

      We thank Reviewer 1 (R1) for the positive feedback and also for pointing out that, in R1’s opinion, there is still a nagging issue related to the activation in response to CS we modeled. In (Krabbe et al., 2019), CS is a pulsed input and US is delivered right after the CS offset. The current objection of R1 is that instead, we are modeling CS and US as continuous and overlapping. R1 suggested that we add the actual input and see if they will produce the desired outputs. The answer is simple: it will not work because we need the effects of CS and US on pyramidal cells to overlap. We note that the fear learning community appears to agree with us that such contingency is necessary for synaptic plasticity (Sun et al., 2020; Palchaudhuri et al., 2024). To the best of our understanding, the source of that overlap is not understood in the community, and the gap has been much noticed (Sun et al., 2020). We do note, however, that STDP may not be the only kind of plasticity in fear learning (Li et al., 2009; Kim et al., 2013, 2016).

      It is important to emphasize that it is not the aim of our paper to model the origin of the overlap. Rather, our intent is to demonstrate the roles of brain rhythms in producing the appropriate timing for STDP, assuming that ECS and F cells can continue to be active after the offset of CS and US, respectively. This assumption is very close to how the field now treats the plasticity, even for auditory fear conditioning (Sun et al., 2020). Thus, our methodology does not contradict known results. However, the question raised by R1 is indeed very interesting, if not the point of our paper. Hence, below we give details about why our hypothesis is reasonable.

      Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. As R1 points out, we did not model the transient increase in BLA spiking activity that occurs in response to each pip in the auditory fear conditioning paradigm. However, we did model the low-level sustained activity that occurs in between pips of the CS in the absence of US (Quirk, Repa and LeDoux, 1995, Fig. 2) and after CS offset (see Fig. 2B, left hand part of our manuscript). We read the data of Quirk et al., 1995 as suggesting that the low-level activity can be sustained for some indefinite time after a pip (cut off of recording was at 500 ms with no noticeable decrease in activity). As such, even if the pips and the US do not overlap in time, as in (Krabbe et al., 2019), the spiking of the ECS can be sustained after CS offset and thus overlap with US, a condition necessary in our model for plasticity through STDP. In Herry et al., 2007 Fig. 3 shows that BLA neurons respond to a pip at the population level with a transient increase in spiking and return to a baseline Poisson firing rate. However, a subset of cells continues to fire at an increased-over-baseline rate after the transient effect wears off (Fig. 3C, top few neurons) and this increased rate extends to the end of the recording time (here ~ 300 ms). These are the cells we consider to be ECS in our model. In Quirk et al., 1997, Fig. 5A also shows sustained low level activity of neurons in BLA in response to a pip. The low-level activity is shown to increase after fear learning, as is also the case in our model since ECS now entrains F so that there are more pyramidal cells spiking in response to CS. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS. 

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the roles of ACh and dopamine in the BLA. The involvement of neuromodulators is consistent with the suggestion of (Sun et al., 2020). The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. As R1 says, it is important for us to give the motivation of our hypotheses. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap.

      To account for these points in the manuscript, we first specified that we consider the effects of the US and CS inputs on the neuronal network as overlapping, while the actual inputs may not overlap. To do that, we added the following text:

      (1) In the introduction: 

      “In this paper, we aim to show 1) How a variety of BLA interneurons (PV, SOM and VIP) lead to the creation of these rhythms and 2) How the interaction of the interneurons and the rhythms leads to the appropriate timing of the cells responding to the US and those responding to the CS to promote fear association through spike-timing-dependent plasticity (STDP). Since STDP requires overlap of the effects of the CS and US, and some conditioning paradigms do not have overlapping US and CS, we include as a hypothesis that the effects of the CS and US overlap even if the CS and US stimuli do not. In the Discussion, we suggest how neuromodulation by ACh and/or dopamine can provide such overlap. We create a biophysically detailed model of the BLA circuit involving all three types of interneurons and show how each may participate in producing the experimentally observed rhythms and interacting to produce the necessary timing for the fear learning.”

      (2) In the Result section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”:

      “The 40-second interval we consider has both ECS and F, as well as VIP and PV interneurons, active during the entire period: an initial bout of US is known to produce a long-lasting fear response beyond the offset of the US (Hole and Lorens, 1975) and to induce the release of neuromodulators. The latter, in particular acetylcholine and dopamine that are known to be released upon US presentation (Harmer and Phillips, 1999; Suzuki et al., 2002; Rajebhosale et al., 2024), may induce more sustained activity in the ECS, F, VIP, and PV neurons during and after the presentation of US, thus ensuring a concomitant activation of those neurons necessary for STDP to take place (see “Assumptions and predictions of the model” in the Discussion).”

      (3) In the Discussion section “Synaptic plasticity in our model”:

      “Synaptic plasticity is the mechanism underlying the association between neurons that respond to the neutral stimulus CS (ECS) and those that respond to fear (F), which instantiates the acquisition and expression of fear behavior. One form of experimentally observed long-term synaptic plasticity is spike-timing-dependent plasticity (STDP), which defines the amount of potentiation and depression for each pair of pre- and postsynaptic neuron spikes as a function of their relative timing (Bi and Poo, 2001; Caporale and Dan, 2008). All forms of STDP require that there be an overlap in the firing of the pre- and postsynaptic cells. In some fear learning paradigms, the US and the CS do not overlap. We address this below under “Assumptions and predictions of the model”, showing how the effects of US and CS on the spiking of the relevant neurons can overlap even in the absence of overlap of US and CS.”

      To fully present our reasoning about the origin of the overlap of the effects of US and CS, we modified and added to the last paragraph of the Discussion section “Assumptions and predictions of the model”, which now reads as follows:

      “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning through STDP. Such a hypothesis, that learning uses spike-timing-dependent plasticity, is common in the modeling literature (Bi and Poo, 2001; Caporale and Dan, 2008; Markram et al., 2011). Current paradigms of fear conditioning include examples in which the CS and US stimuli do not overlap (Krabbe et al., 2019). Such a condition might seem to rule out the mechanisms in our paper. Nevertheless, the argument below suggests that the effects of the CS and US can cause an overlap in neuronal spiking of ECS, F, VIP, and SOM, even when CS and US inputs do not overlap.

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence suggests that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015).   Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, neuromodulator release should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem apparently posed by the non-overlap US and CS in some paradigms of auditory fear conditioning (Krabbe et al., 2019) may be solved by considering the roles of ACh and dopamine in the BLA. The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap. We note that, even though ECS and F neurons have the ability to fire continuously when ACh and dopamine are involved, the participation of the interneurons enforces periodic silence needed for the depression-dominated STDP.”

      In the Discussion (in section “Involvement of other brain structures”), we also acknowledged that the overlap between the effects of US and CS in the BLA may be provided by other brain structures by writing the following:

      “In our model, the excitatory projection neurons and VIP and PV interneurons show sustained activity during and after the US presentation, thus allowing potentiation through STDP to take place. The medial prefrontal cortex and/or the hippocampus may provide the substrates for the continued firing of the BLA neurons after the 2-second US stimulation. We also discuss below that this network sustained activity may originate from neuromodulator release induced by US (see section “Assumptions and predictions of the model” in the Discussion).”

      We also improved our discussion about the (Grewe et al., 2017) paper, which questions Hebbian plasticity in the context of fear conditioning based on several critiques. We included a new section in the Discussion entitled “Is STDP needed in fear conditioning?” to discuss those critiques and how our model may address them, which reads as follows:

      “Is STDP needed in fear conditioning? The study in (Grewe et al., 2017) questions the validity of the Hebbian model in establishing associative learning during fear conditioning. There are several critiques we discuss here. The first critique is that Hebbian plasticity does not explain the experimental finding showing that both upregulation and downregulation of stimulus-evoked responses are present between coactive neurons. The upregulation is provided by our model, so the issue is the downregulation, which is not addressed by our model. However, our model highlights that coactivity alone does not create potentiation; the fine timing of the pre- and postsynaptic spikes determines whether there is potentiation or depression. Here, we find that PING networks are instrumental in setting up the fine timing for potentiation. We suggest that networks not connected to produce the PING may undergo depression when coactive.

      The second critique raised by (Grewe et al., 2017) is that Hebbian plasticity alone does not explain why most of the cells exhibiting enhanced responses to the CS did not react to the US before fear conditioning. They suggest that neuromodulators may provide a third condition (besides the activity of the pre- and postsynaptic neurons) that changes the plasticity rule. Our model also does not explicitly address this experimental finding since it requires F to be initially activated by US in order for the fear association to be established. We agree that the fear cells described in (Grewe et al. 2017) may be depolarized by the US without reaching the spiking threshold; however, with neuromodulation provided during the fear training, the same input can lead to spiking, enabling the conditions for Hebbian plasticity. Our discussions above about how neuromodulators affect excitability are relevant to this point. We do not exclude that other forms of plasticity may play a role during fear conditioning in cells not initially activated by the US, but this is not the topic of our modeling study.

      The third critique raised by (Grewe et al., 2017) is that Hebbian plasticity cannot explain why the majority of cells that were US- and CS-responsive before training have a reduced CS-evoked response afterward. The reduced response happens over multiple exposures of CS without US; this can involve processes similar to those present in fear extinction, which require plasticity in further networks, especially involving the infralimbic cortex (Milad and Quirk, 2002; Burgos-Robles et al., 2007). An extension of our model could investigate such mechanisms. In the fourth critique, (Grewe et al., 2017) suggests that the Hebbian plasticity rule cannot easily account for the reduction of the responses of many CS+-responsive cells, but not of the CS−-responsive cells. We suggest that the circuits involving paradigms similar to fear extinction do not involve the CS- cells.

      Overall, we agree with (Grewe et al., 2017) that neuromodulators play a crucial role in fear conditioning, especially in prolonging the US- and CS-encoding activity as discussed in (see section “Assumptions and predictions of the model” in the Discussion), or even participating in changing the details of the plasticity rule. A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al., 2017), in which the potential role of neuromodulators is taken into account in the plasticity rule in addition to the pre- and postsynaptic neuron activity. Another direction is to investigate a possible relationship between neuromodulation and a depression-dominated Hebbian rule.”

      Finally, we made additional minor changes to the manuscript:

      (1) In the Result section “Interneurons interact to modulate fear neuron output”, we specified the following:

      “The US input on the pyramidal cell and VIP interneuron is modeled as a Poisson spike train at ~ 50 Hz and an applied current, respectively. In the rest of the paper, we will use the words “US” as shorthand for “the effects of US”.” 

      (2) In the Result section “Interneuron rhythms provide the fine timing needed for depression dominated STDP to make the association between CS and fear”, we also reported the following:

      “Similarly to the US, in the rest of the paper, we will use the words “CS” as shorthand for “the effects of CS”. In our simulations, CS is modeled as a Poisson spike train at ~ 50 Hz, independent of the US input. Thus, we hypothesize that the time structure of the inputs sometimes used for the training (e.g., a series of auditory pips) is not central to the formation of the plasticity in the network.”  

      Reviewer #2 (Public Reviews):

      The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA. 

      After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extrahippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled. 

      In our public reply to the Reviewer’s point, we reported the following:

      (1) We kindly disagree that (Antonoudiou et al., 2022) contrasts with our study. (Antonoudiou et al., 2022) is a slice study showing that the BLA theta power (3-12 Hz) increases with gabazine compared to baseline. With all GABAergic currents omitted due to gabazine, the LFP is composed of excitatory currents and intrinsic currents. In our model, the high theta (6-12 Hz) comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. Thus, the model produces high theta in the presence of gabazine (see Fig. 1 in our replies to the Reviewers’ public comments). The model also shows that a PING rhythm is produced without gabazine, and that this rhythm goes away with gabazine because PING requires feedback inhibition from PV to fear cells. Thus, the high theta increase and gamma reduction with gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model.

      (2) We agree that (Antonoudiou et al., 2022) alone is not sufficient evidence that the BLA can produce low theta (3-6 Hz); we discussed a new paper (Bratsch-Prince et al., 2024) that provides further evidence of BLA ability to produce low theta and under what circumstances. The authors reported that intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be provided by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003). We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. In future work, we will aim to show that ACh activates the BLA VIP cells, which are essential to the low theta generation in the network.

      In the manuscript, we added to and modified the Discussion section “Where the rhythms originate, and by what mechanisms”. This text aims to better discuss (Antonoudiou et al. 2022) and introduce (Bratsch-Prince et al., 2024) with its connection to our hypothesis that the theta oscillations can be produced within the BLA. The new version is:

      “Where the rhythms originate, and by what mechanisms. A recent experimental paper (Antonoudiou et al., 2022) suggests that the BLA can intrinsically generate theta oscillations (312 Hz) detectable by LFP recordings when inhibition is totally removed due to gabazine application. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. In our model, we note that when inhibition is removed, both AMPA and intrinsic currents contribute to the network dynamics and the LFP. Thus, interneurons with their specific intrinsic currents (i.e., D-current in the VIP interneurons, and NaP- and H- currents in SOM interneurons) can indeed affect the model LFP and support the generation of theta and gamma rhythms (Fig. 6G). 

      Another slice study, (Bratsch-Prince et al., 2024), shows that BLA is intrinsically capable of producing a low theta rhythm with ACh stimulation and without needing external glutamate input. ACh is produced in vivo by the basal forebrain in response to US (Rajebhosale et al., 2024). Although we did not explicitly include the BF and ACh modulation of BLA in our model, we implicitly include the effect of ACh in BLA by increasing the activity of the VIP cells, which then produce the low theta rhythm. Indeed, low theta in the BLA is known to depend on the muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the class of VIP neurons in our model (Mascagni and McDonald, 2003; Krabbe et al., 2018). 

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratoryrelated low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper. However, we emphasize that there is also evidence (as discussed above) that these rhythms arise within the BLA.”

      Reviewer #2 (Recommendations for the Authors):

      (1) Three different types of VIP interneurons with distinct firing patterns have been revealed in the BLA (Rhomberg et al., 2018). Does the generation of rhythmic activities depend on the firing features of VIP interneurons? Does it matter whether VIP interneurons fire burst of action potentials or they discharge more regularly?  

      (2) The authors used data for modeling SST interneurons obtained e.g., in the hippocampus. However, there are studies in the BLA where the intrinsic characteristics of SST interneurons have been reported (Unal et al., 2020; Guthman et al., 2020; Vereczki et al., 2021). Have the authors considered using results of studies that were conducted in the BLA? 

      We thank the Reviewer for their questions, which have helped us further improve our manuscript in response to similar queries from Reviewer 3 in the previous review round. More in detail:

      (1) Although other electrophysiological types exist (Sosulina et al., 2010), we hypothesized that the electrophysiological type of VIP neurons that display intrinsic stuttering is the type that would be involved in mediating low theta oscillations during fear conditioning. This is because VIP intrinsic stuttering in cortical neurons is thought to involve the D-current, which helps create low theta bursting oscillations in the neuronal spiking patterns (Chartove et al., 2020). We think that the other subtypes of VIP interneurons are not essential for the low theta oscillatory dynamics observed during fear conditioning and, thus, did not provide an essential constraint for the phenomena we are trying to capture. VIP interneurons in our network must fire bursts at low theta to be effective in creating the pauses in ECS and F spiking needed for potentiation; single spikes at theta are not sufficient to create these pauses.

      (2) In our model, we used the results conducted in a BLA study (Sosulina et al., 2010). SOM cells in the BLA display several physiologic types. We chose to include in our model the type showing early adaptation in response to a depolarizing current and inward (outward) rectification upon the initiation (release) of a hyperpolarizing current. We hypothesize that this type can produce high theta oscillations, a prominently observed rhythm in the BLA. Unal et al., 2020 (Unal et al., 2020) found two populations of SOM cells in the BLA, which have been previously recorded in (Sosulina et al., 2010), including the one type we chose to model. This SOM cell type shows a low threshold spiking profile characterized by spike frequency adaptation and voltage sag indicative of an H-current used in our model. Guthman et al., 2020, (Guthman et al., 2020), also found a population of SOM cells with hyperpolarization induced sag.

      Our model also uses a NaP-current for which there is no data in the BLA. However, it is known to exist in hippocampal SOM cells and that NaP- and H- currents can produce such a high theta in hippocampal cells. It is a standard practice in modeling to use the best possible replacement for unknown currents. Of course, it is unfortunate to have to do this. We also note that models can be considered proof of principle, that can be proved or disproved by further experimental work. Both (Guthman et al., 2020) and (Vereczki et al., 2021) also uncover further heterogeneity among BLA SOM interneurons involving more than electrophysiology. We hypothesize that such a level of heterogeneity revealed by these three studies is not key to the question we are asking (where crucial ingredients are the rhythms) and, therefore, was not included in our minimal model.

      We modified the Discussion section titled “Assumptions and predictions of the model” as follows:

      “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute most biologically detailed models. For example, although there is considerable variability in the activity patterns of both VIP cells and SOM cells (Sosulina et al., 2010; Guthman et al., 2020; Ünal et al., 2020; Vereczki et al., 2021), our focus was specifically on those subtypes that generate critical rhythms within the BLA. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”

      (3) The authors may double-check the reference list, as e.g., Cuhna-Reis et al., 2020 is not listed. 

      We thank the Reviewer for spotting this. We checked the reference list and all the references are now listed.

      Finally, we wanted to acknowledge that we made other changes to the manuscript unrelated to the reviewers’ questions with the purpose of gaining clarity. More specifically:

      (1) We included a section titled “Significance” after the abstract and keywords, which reads as follows:

      “Our paper accounts for the experimental evidence showing that amygdalar rhythms exist, suggests network origins for these rhythms, and points to their central role in the mechanisms of plasticity involved in associative learning. It is one of the few papers to address high-order cognition with biophysically detailed models, which are sometimes thought to be too detailed to be adequately constrained. Our paper provides a template for how to use information about brain rhythms to constrain biophysical models. It shows in detail, for the first time, how multiple interneurons help to provide time scales necessary for some kinds of spike-timing-dependent plasticity (STDP). It spells out the conditions under which such interactions between interneurons are needed for STDP and why. Finally, our work helps to provide a framework by which some of the discrepancies in the fear learning literature might be reevaluated. In particular, we discuss issues about Hebbian plasticity in fear learning; we show in the context of our model how neuromodulation might resolve some of those issues. The model addresses issues more general than that of fear learning since it is based on interactions of interneurons that are prominent in the cortex, as well as the amygdala.”

      (2) The Result section “Physiology of the interneuron types is critical to their role in depression-dominated plasticity”, which is now titled “Mechanisms by which interneurons contribute to potentiation in depression-dominated plasticity”, now reads as follows:

      “Mechanisms by which interneurons contribute to potentiation during depressiondominated plasticity. The PV cell is necessary to induce the correct pre-post timing between ECS and F needed for long-term potentiation of the ECS to F conductance. In our model, PV has reciprocal connections with F and provides lateral inhibition to ECS. Since the lateral inhibition is weaker than the feedback inhibition, PV tends to bias ECS to fire before F. This creates the fine timing needed for the depression-dominated rule to instantiate plasticity. If we used the classical Hebbian plasticity rule (Bi and Poo, 2001) with gamma frequency inputs, this fine timing would not be needed and ECS to F would potentiate over most of the gamma cycle, and thus we would expect random timing between ECS and F to lead to potentiation (Fig. S4). In this case, no interneurons are needed (See Discussion “Synaptic plasticity in our model” for the potential necessity of the depression-dominated rule). 

      In this network configuration, the pre-post timing for ECS and F is repeated robustly over time due to coordinated gamma oscillations (PING, as shown in Fig. 4A, Fig. 1C) arising through the reciprocal interactions between F and PV (Feng et al., 2019). PING can arise only when PV is in a sufficiently low excitation regime such that F can control PV activity (Börgers et al., 2005), as in Fig. 4A. However, although such a low excitation regime establishes the correct fine timing for potentiation, it is not sufficient to lead to potentiation (Fig. 4A, Fig. S2C): the depression-dominated rule leads to depression rather than potentiation unless the PING is periodically interrupted. During the pauses, made possible only in the full network by the presence of VIP and SOM, the history-dependent build-up of depression decays back to baseline, allowing potentiation to occur on the next ECS/F active phase. (The detailed mechanism of how this happens is in the Supplementary Information, including Fig. S2). Thus, a network without the other interneuron types cannot lead to potentiation. Though a low excitation level for a PV cell is necessary to produce a PING, a higher excitation level is necessary to produce a pause in the ECS and F. This higher excitation level is consistent with the experimental literature showing a strong activation of PV after the onset of CS (Wolff et al., 2014). The higher excitation happens when the VIP cell is silent, whereas a low excitation level is achieved when the VIP cell fires and partially inhibits the PV cell (Fig. 4B, Fig. S2D). The interruption in the ECS and F activity requires the participation of another interneuron, the SOM cell (Figs. 2B, S2): the pauses in inhibition from the VIP periodically interrupt ECS and F firing by releasing PV and SOM from inhibition and thus indirectly silencing ECS and F. Without these pauses, depression dominates (see SI section “ECS and F activity patterns determine overall potentiation or depression”).”

      We also removed a supplementary figure (Fig. S2).

      (3) We wanted to be clear and motivate our choice to extend the low theta range to 2-6 Hz and the high theta range to 6-14 Hz, compared to the 3-6 Hz and 6-12 Hz, respectively in the BLA experimental literature. Our main reason for extending the ranges was because the peaks of low and high theta power in the VIP and SOM cells, respectively, (the cells that generate these oscillations) occurred at the borders of the experimental ranges. Thus, in order to include the peaks of the model LFP, we lowered the low theta range by 1 Hz and increased the high theta range by 2 Hz.

      We present a new supplementary figure (Fig. S1) containing the power spectra of VIP, which is the source of low theta in our model, and SOM interneuron, which is the source of high theta:

      We mention Fig. S1 in the Result section “Rhythms in the BLA can be produced by interneurons”, where we added the following text: o “In the baseline condition, the condition without any external input from the fear conditioning paradigm (Fig. 1B, top), our VIP neurons exhibit short bursts of gamma activity (~38 Hz) at low theta frequencies (~2-6 Hz) (peaking at ~3.5 Hz) (see Fig. S1A).” o “In our baseline model, SOM cells have a natural frequency of ~12 Hz (Fig. 1B, middle; Fig. S1B), which is at the upper limit of the experimental high theta range; this motivates our choice to extend the high theta range up to 14 Hz in order to include the peak.” 

      Knowing the natural frequencies of VIP and SOM interneurons from the Result section “Rhythms in the BLA can be produced by interneurons”, we specified more clearly that we quantify the change of power in the low and high theta range around the power peaks in those ranges. Specifically, we changed some sentences in the first paragraph of the Result section “Increased low-theta frequency is a biomarker of fear learning” as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E).”

      Finally, we made a few other small changes:

      In the Introduction, we mention the following: “We also note that there is not uniformity on the exact frequencies associated with low and high theta, e.g., ((Lorétan et al., 2004) used 2-6 Hz for low theta). Here, we use 2-6 Hz for the theta range and 6-14 Hz for the high theta range.”

      In Fig. 6DE (reported below point 3)), we reran the statistics using a smaller interval for high theta (11.5-13 Hz) to focus around the peak. Our initial result showing significant change in low theta between pre and post fear conditioning and no change in high theta still holds.

      In Fig. 6 of the Result section “Increase low-theta frequency is a biomarker of fear learning”, we switched the order of panels F and G. This change allows us to first focus on the AMPA currents, which are the major contributors of the low theta power increase, and to specify what AMPA current drives that increase. After that, we present the power spectrum of the GABA currents, as well.

      The corresponding text in the Result section, now reads as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E). These results are consistent with the experimental findings in (Davis et al., 2017). Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6F). It is the AMPA currents to the PV interneurons that are directly responsible for the low theta increase; it is the newly potentiated ECS to F synapse that paces the AMPA currents in the PV interneurons to go at low theta. Thus, the low theta increase is due to added excitation provided by the new learned pathway.”

      (4) In the Discussion section “Assumptions and predictions of the model”, we specified the following:

      “Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”

      (5) Finally, to broaden the potential interest of our study, we added the following sentences:

      At the conclusion of the abstract:

      “The model makes use of interneurons commonly found in the cortex and, hence, may apply to a wide variety of associative learning situations.” - At the conclusion of the introduction:

      “Finally, we note that the ideas in the model may apply very generally to associative learning in the cortex, which contains similar subcircuits of pyramidal cells and interneurons: PV, SOM and VIP cells.” 

      Also, changes in the emphasis of the paper led us to remove the following from the abstract: “Finally, we discuss how the peptide released by the VIP cell may alter the dynamics of plasticity to support the necessary fine timing.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following issues.

      (1) Fig. 3: The analgesic effects after astrocyte ablation appear to recover after one week. Is this due to repopulation of astrocytes?

      Although we did not detect the proliferation of astrocytes, we hypothesized that it was likely related to the microglia phagocytosis of astrocyte debris after astrocyte ablation. Microglia are known to have the function of phagocytosis of cell debris. Diphtheria toxin-mediated cell ablation caused AAV2/5-GfaABC1D-Cre labeled astrocytes death and cell fragmentation. We hypothesized that the microglia could phagocyte the astrocyte fragments and were stimulated to activate type I interferon signal. When microglia phagocyte debris ended, the activation of type I interferon signal was also declined. Reduced activation of type I interferon signal may also be accompanied by recurrence of pain.

      (2) Fig. 3: Please justify the large sample size of n=30-36. Is this sample size based on previous studies or statistical estimation?

      The number of mice was based on our previous report [1], and the increased number of mice may also ensure that the pain data would also be reliable. Not only did we explore the differences between the sexes, and we also needed to obtain samples at different times for different experiments.

      (3) Please try to plot individual data points for some critical time points to demonstrate data distribution. It is also helpful to plot male and female data points separately for some time points.

      Individual data have been plotted as your request and added in the supplementary material.

      (4) It is unclear if the same number of males and females were used in this study, as females were typically used for SCI studies. I wonder if you can use repeated measures Two-Way ANOVA for statistical analysis.

      According to our observations, the number of males and females was not the same, while both of them were sufficient for statistical analysis. In addition, in the process of breeding transgenic mice, we would obtain both male and female mice, and rational use of mice may be better for us. Indeed, previous studies have shown that female mice are more commonly used in pain studies. Although we did not observe a gender difference in this study, it has been reported in the previous studies that gender is one of the factors for pain differences. According to your suggestion, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results were consistent with the previous results, so we did not modify the statistical results of the pictures.

      (5) Fig. 3C, D: The effects of astrocyte ablation on mechanical pain are mild, compared to thermal pain. Electronic von Frey apparatus may be difficult for mice. It works very well for rats and large animals.

      Since the animals involved in this study were all mice, we did not know how electronic von Frey was used in rats and large animals. But after the using of electronic von Frey, it seems to us that electronic von Frey is very suitable for mouse experiments. Best of all, our electronic von Frey can achieve accuracy as low as 0.01g. This allows us to detect very sensitive pain data, which may be more accurate and intuitive than before.

      (6) Fig. 2B: In the figure legend it states n = 3 biological repeats. There are many more dots in each column. Are these individual animals or spinal cord sections?

      As we describe in our method, n = 3 biological repeats represented three biological repeats per group, i.e., three mice/group with three IF per mouse. We take three or more values in each ascending tract (depending on the partition size of the different ascending tracts of lumbar enlargements). So, we would get more data as shown in Figure 2, which could be also more reliable.

      (7) Fig. 4C: It appears that GFAP is increased by toxin treatment. Please explain this result.

      This figure was calculated for astrocyte activation in the lesion area (T9-10), but not for the lumbar enlargement.

      Reviewer #2 (Recommendations For The Authors):

      Specific Comments:

      RNA-Sequencing Analysis: The strength of the RNA-sequencing data in elucidating the impact of astrocyte elimination is compelling. While the focus on IFN signaling is well-supported, the manuscript overlooks other differentially expressed genes. A deeper analysis or at least a discussion of these genes could enrich the study's conclusions, offering a more holistic view of the underlying mechanisms.

      Although we did not focus more on other relevant differential genes, we focused on the most significant differential genes, for these differential genes have a more significant effect on pain.

      Q2: Figure Presentation: Consolidating Figures 1-3 could increase the clarity of the result presentation, reducing distractions from the main narrative. Certain aspects, such as the comparison of different tracts in Figure 2B and the body weight data in Figure 3C, seem tangential and might be better suited for supplementary materials.

      The comparison of astrocyte activation in different ascending tracts of lumbar enlargements explained the relationships between astrocyte activation and pain, and laid the foundation for the subsequent astrocyte elimination. The weight data is also important, reflecting not only the changes in the overall recovery process after spinal cord injury, but also the effect of astrocyte elimination on the overall effect of mice. Thus, the weight data together with the pain test results will be more intuitive for the reader to understand the change of overall conditions of mice after astrocyte elimination.

      Q3: Schematic Clarity: The schematic in Figure 1A is confusing, particularly in distinguishing between transgenic mice and viral constructs. The inconsistent naming of Cre recombinase (alternatively referred to as Cre, CRE, and sometimes DRE) further complicates understanding. Standardizing these elements would greatly enhance clarity for the readers.

      As we described in the part of method, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice contain both Loxp-stop-Loxp sequence and Rox-stop-Rox sequence. In the process of reproduction, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice crossed with C57BL/6JSmoc-Tg(CAG-Dre)Smoc mice could remove the Rox-stop-Rox sequence, which could further crossed with mice containing Cre recombinase, or with AAV2/5-GfaABC1D-Cre intervention to remove the Loxp-stop-Loxp sequence and induce the expression of tdTomato and DTR.

      Q4: Pathway Analysis: The discussion of the signal pathway analysis in Figure 8 leans heavily on speculation without direct evidence from the study. Distinguishing clearly between findings and literature-derived hypotheses is crucial. A more detailed discussion that properly cites sources for each pathway element would strengthen the manuscript.

      According to your question, we have added this figure to the supplementary picture.

      Q5: Statistical Analysis: The use of one-way ANOVA, despite presenting data in groups, is misaligned with the data's structure. Employing two-way ANOVA followed by post-hoc comparisons is appropriate for statistical analysis.

      According to your suggestions, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results are consistent with the previous ones. Therefore, we did not modify the statistical results of the pictures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We appreciate the valuable and constructive comments of Reviewer #1 on our manuscript. We have addressed the comments from Reviewer #1 in the public review in the response to the recommendations for the authors, as the public review comments largely overlap with that of the recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      (1.1) Figure 1 did not use a mock-infected control for the development of R-loops but only a time before infection. I think it would have been a good control to have that after the same time of infection non-infected cells did not show increases in R-loops and this is not a product of the cell cycle.

      We prepared our DRIPc-seq library using cell extracts harvested at 0, 3, 6, and 12 h post-infection (hpi), all at the same post-seeding time point. Each sample was infected with HIV-1 virus in a time-dependent manner. Therefore, it is unlikely that the host cellular R-loop induction observed in our DRIPc-seq results was due to R-loop formation during the cell cycle. In Lines 93–95 of the Results section of the revised manuscript, we have provided a more detailed description of our DRIPc-seq library experimental scheme. Thank you. 

      (1.2) Figure 2 should have included a figure showing the proportion of DRIPc-seq peaks located in different genome features relative to one another instead of whether they were influenced by time post-infection. Figure 2C was performed in HeLa cells, but primary T cell data would have been more relevant as primary CD4+ T cells are more relevant to HIV infection.

      We have included a new figure presenting the relative proportion of DRIPc-seq peaks mapped to different genomic features at each hpi (Fig. 2C of the revised manuscript). We found that the proportion of DRIPc-seq peaks mapped to various genomic compartments remained consistent over the hours following the HIV-1 infection. This further supports our original claim that HIV-1 infection does not induce R-loop enrichment at specific genomic features but that the accumulation of R-loops after HIV-1 infection is widely distributed.

      We considered HeLa cells as the primary in vitro infection model, therefore, we conducted RNA-seq only on HeLa cells. However, we agree with the reviewer's opinion that data from primary CD4+ T cells may be more physiologically relevant. Nevertheless, as demonstrated in the new figure (Fig. 2C of the revised manuscript), HIV-1 infection did not significantly alter the proportion of R-loop peaks mapped to specific genomic compartments, such as gene body regions, in HeLa, primary CD4+ T, and Jurkat cells. Therefore, we anticipate no clear correlation between changes in gene expression levels and R-loop peak detection upon HIV-1 infection, even in primary T cells. Thank you.   

      (1.3) Figure 5G is very hard to see when printed, is there a change in brightness or contrast that could be used? The arrows are helpful but they don't seem to be pointing to much.

      We have highlighted the intensity of the PLA foci and magnified the images in Fig. 5G in the revised manuscript. While editing the images according to your suggestion, we found a misannotation regarding the multiplicity of infection in the number of PLA foci per nucleus quantification analysis graph in Fig. 5G of the original manuscript. We have corrected this issue and hope that it is now much clearer. 

      (1.4) The introduction provided a good background for those who may not have a comprehensive understanding of DNA-RNA hybrids and R-loops, but the rationale that integration in non-expressed sequence implies that R-loops may be involved is very weak and was not addressed experimentally. A better rationale would have been to point out that, although integration in genes is strongly associated with gene expression, the association is not perfect, particularly in that some highly expressed genes are, nonetheless, poor integration targets.

      In accordance with the reviewer's comment, we revised the Introduction. We have deleted the statement and reference in the introduction "... the most favored region of HIV-1 integration is an intergenic locus, ...”, which may overstate the relevance of the R-loop in HIV-1 integration events in non-expressed sequences. Instead, we introduced a more recent finding that high levels of gene expression do not always predict high levels of integration, together with the corresponding citation (Lines 46– 47 of the revised manuscript), according to the reviewer’s suggestion in the reviewer's public review 2)-(a).

      (1.5) The discussion was seriously lacking in connecting their conclusions regarding R-loop targeting of integration to how integration works at the structural level, where it is very clear that concerted integration on the two DNA strands ca 5 bp apart is essential to correct, 2-ended integration. It is very difficult to visualize how this would be possible with the triple-stranded R-loop as a target. The manuscript would be greatly strengthened by an experiment showing concerted integration into a triplestranded structure in vitro using PICs or pure integrase.

      We believe there has been a misunderstanding of our interpretation regarding the putative role of R-loop structures in the HIV-1 integration site mechanism because of some misleading statements in our original manuscript. Based primarily on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops. By carefully revising our manuscript, we found that the title, abstract, and discussion of our original manuscript includes phrases, such as “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and nonspecific details of our findings.  

      Using multiple biochemical experiments, we successfully demonstrated the interaction between the cellular R-loop and HIV-1 integrase proteins in cells and in vitro (Fig. 5 of the revised manuscript). However, we could not validate whether the center of the triple-stranded R-loops is the extraction site of HIV-1 integration, where the strand transfer reaction by integrase occurs. This is because an R-loop can be multi-kilobase in size (1, 2); therefore, we displayed a large-scale genomic region (30-kb windows) to present the integration sites surrounding the R-loop centers. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. When infected with HIV-1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity upon R-loop induction in designated regions following DOX treatment (Fig. 3C and 3D of the revised manuscript). In addition, we quantified site-specific integration events in R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      We agree with the reviewer that an experiment showing the concerted integration of purified PICs into a triple-stranded structure in vitro would greatly strengthen our manuscript. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S) procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we could not purify the nucleic acid-bound protein complexes for in vitro integration assays. However, we believe that pgR-poor and pgR-rich cell line models provide a strong advantage in specificity of our primer readouts. Compounded with our in cellulo observation, we believe that our work provides strong evidence for a causative relationship between R-loop formation/R-loop sites and HIV-1 integration.

      Additionally, in the Discussion section of the revised manuscript, we have expanded our discussion on the role of genomic R-loops contributing in molding the host genomic environment for HIV-1 integration site selection, and the potential explanation on how R-loops are driving integration over long-range genomic regions. Thank you. 

      (1.6) There are serious concerns with the quantitation of integration sites used here, which should be described in detail following line 503 but isn't. In Figure 3, E-G, they are apparently shown as reads per million, while in Figure 4B as "sites (%)" and in 4C as log10 integration frequency." Assuming the authors mean what they say, they are using the worst possible method for quantitation. Counting reads from restriction enzyme-digested, PCR-digested DNA can only mislead. At the numbers provided (MOI 0.6, 10 µg DNA assayed) there would be about 1 million proviruses in the samples assayed, so the probability of any specific site being used more than once is very low, and even less when one considers that a 10% assay efficiency is typical of integration site assays. Although the authors may obtain millions of reads per experiment, the number of reads per site is an irrelevant value, determined only by technical artefacts in the PCR reactions, most significantly the length of the amplicons, a function of the distance from the integration site to the nearest MstII site, further modified by differences in Tm. Better is to collapse identical reads to 1 per site, as may have been done in Figure 4B, however, the efficiency of integration site detection will still be inversely related to the length of the amplicon. Indeed, if the authors were to plot the read frequency against distance to the nearest MstII site, it is likely that they would get plots much like those in Figure 4B.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described by Li et al., mBio, 2020; 11(5) (4).  

      While it may be correct that the HIV-1 integration event cannot occur more than once at a given site, our Fig. 3E, 4C, and 4D of the revised manuscript present the number of integration-site sequencing read counts expressed in reads-per-million (RPM) units or as log10-normalized values. Based on the number of mapped reads from the integration site sequencing results, we can infer that there was an integration event at this site, whether it was a single or multiple event.

      We believe that the original annotation of y-axis, “Integration frequency,” may be misleading as it can be interpreted as a probability of any specific site being used for HIV-1 integration. Therefore, we corrected it as “number of mapped read” for clarity (Fig. 3E–G, 4C and 4D, and the corresponding figure legends of the revised manuscript). We apologize for any confusion. Thank you.

      Other points:

      (1.7) Overall: There are numerous grammatical and usage errors, especially in agreement of subject and verb, and missing articles, sometimes multiple times in the same sentence. These must be corrected prior to resubmission.

      The revised manuscript was edited by a professional editing service. Thank you.

      (1.8) Line 126-134: A striking result, but it needs more controls, as discussed above, including a dose-response analysis.

      We determined the doses of NVP and RAL inhibitors in HeLa cells by optimizing the minimum dose of drug treatment that provided a sufficient inhibitory effect on HIV1 infection (Author response image 1). The primary objective of this experiment was to determine R-loop formation while reverse transcription or integration of the HIV-1 life cycle was blocked, therefore, we do not think that a dose-dependent analysis of inhibitors is required.

      Author response image 1.

      (A and B) Representative flow cytometry histograms of VSV-G-pseudotyped HIV-1-EGFP-infected HeLa cells at an MOI of 1, harvested at 48 hpi. The cells were treated with DMSO, the indicated doses of nevirapine (NVP) (A) or indicated doses of raltegravir (RAL) (B) for 24 h before infection. 

      (1.9) Line 183: Please tell us what ECFP is and why it was chosen. Is there a reference for its failure to form R-loops?

      Ibid: The human AIRN gene is a very poor target for HIV integration in PBMC.

      A high GC skew value (> 0) is a predisposing factor for R-loop formation at the transcription site. This is because a high GC skew causes a newly synthesized RNA strand to hybridize to the template DNA strand, and the non-template DNA strand remains looped out in a single-stranded conformation (5) (Ref 36 in the revised manuscript). The ECFP sequence possessed a low GC skew value, as previously used for an R-loop-forming negative sequence (6) (Ref 17 of the revised manuscript). We have added this description and the corresponding references to Lines 188–192 of the revised manuscript.  

      The human AIRN gene (RefSeq DNA sequence: NC_000006.12) sequence possesses a GC skew value of -0.04, in a window centered at base 2186, while the mouse AIRN (mAIRN) sequence is characterized by a GC skew value of 0.213. The ECFP sequence gave a GC skew value of -0.086 in our calculation. We anticipated that the human AIRN gene region does not form a stable R-loop, and in fact, it did not harbor R-loop enrichment upon HIV-1 infection in our DRIPc-seq data analysis of multiple cell types (Author response image 2)

      Author response image 2.

      Genome browser screenshot over the chromosomal regions in 20-kb windows centered on human AIRN showing results from DRIPc-seq in the indicated HIV-1-infected cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi)

      (1.10) Line 190: You haven't shown dependence. Associated is a better word.

      Thank you for the suggestion. We have changed “R-loop-dependent site-specific HIV-1 integration events...” to “R-loop-associated site-specific HIV-1 integration events...” (Line 198 of the revised manuscript) according to the reviewer’s suggestion in the revised manuscript. 

      (1.11) Line 239: What happened to P1? What is the relationship of the P and N regions to genes?

      We have added superimpositions of the P1 chromatin region on DRIPc-seq and the HIV-1 integration frequency to Figure 4C of the revised manuscript. We observed a relevant integration event within the P1 R-loop region, but to a lesser extent than in the P2 and P3 R-loop regions, perhaps because the P1 region has relatively less R-loop enrichment than the P2 and P3 regions, as examined by DRIP-qPCR in S3A Fig. of the revised manuscript.

      Genome browser screenshots with annotations of accommodating genes in the P and N regions are shown in S2A–E Fig. of the revised manuscript, and RNA-seq analysis of the relative gene expression levels of the P1-3 and N1,2 R-loop regions are shown in S4 Table of the revised manuscript. Thank you.

      (1.12) Line 261: But the binding affinity of integrase to the R-loop is somewhat weaker than to double-stranded DNA according to Figure 5A.

      Nucleic acid substrates were loaded at the same molarity, and the percentage of the unbound fraction was calculated by dividing the intensity of the unbound fraction in each lane by the intensity of the unbound fraction in the lane with 0 nM integrase in the binding reaction. The calculated percentages of the unbound fraction from three independent replicate experiments are shown in Fig. 5A, right of the revised manuscript. In our analysis and measurements, the integrase proteins showed higher binding affinities to the R-loop and R-loop comprising nucleic acid structures than to dsDNA in vitro. We hope that this explanation clarifies this point. 

      (1.13) Line 337: "accumulate". This is a not uncommon misinterpretation of the results of studies on the distribution of intact proviruses in elite controllers. The only possible correct interpretation of the finding is that proviruses form everywhere else but cells containing them are eliminated, most likely by the immune system.

      Thank you for the suggestion. We have changed the Line 337 of the original manuscript to “... HIV-1 proviruses in heterochromatic regions are not eliminated but selected by immune system,” in Lines 361-363 of the revised manuscript. 

      (1.14) Line 371 How many virus particles per cell does this inoculum amount to?

      We determined the amount of GFP reporter viruses required to transduce ∼50% of WT Jurkat T cells, corresponding to an approximate MOI of 0.6. We repeatedly obtained 30–50% of VSV-G-pseudotyped HIV-1-EGFP positively infected cells for HIV1 integration site sequencing library construction for Jurkat T cells. 

      (1.15) Line 503 and Figures 3 and 4: There must be a clear description of how integration events are quantitated.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described in Li et al., mBio, 2020; 11(5) (4).

      Reviewer #2 (Public Review):

      Retroviral integration in general, and HIV integration in particular, takes place in dsDNA, not in R-loops. Although HIV integration can occur in vitro on naked dsDNA, there is good evidence that, in an infected cell, integration occurs on DNA that is associated with nucleosomes. This review will be presented in two parts. First, a summary will be provided giving some of the reasons to be confident that integration occurs on dsDNA on nucleosomes. The second part will point out some of the obvious problems with the experimental data that are presented in the manuscript.

      We appreciate your comments. We have carefully addressed the concerns expressed as follows (your comments are in italics):  

      (2.1) 2017 Dos Passos Science paper describes the structure of the HIV intasome. The structure makes it clear that the target for integration is dsDNA, not an R-loop, and there are very good reasons to think that structure is physiologically relevant. For example, there is data from the Cherepanov, Engelman, and Lyumkis labs to show that the HIV intasome is quite similar in its overall structure and organization to the structures of the intasomes of other retroviruses. Importantly, these structures explain the way integration creates a small duplication of the host sequences at the integration site. How do the authors propose that an R-loop can replace the dsDNA that was seen in these intasome structures?

      We do appreciate the current understanding of the HIV-1 integration site selection mechanism and the known structure of the dsDNA-bound intasome. Our study proposes an R-loop as another contributor to HIV-1 integration site selection. Recent studies providing new perspectives on HIV-1 integration site targeting motivated our current work. For instance, Ajoge et al., 2022 (7) indicated that a guanine-quadruplex (G4) structure formed in the non-template DNA strand of the R-loop influences HIV-1 integration site targeting. Additionally, I. K. Jozwik et al., 2022 (8) showed retroviral integrase protein structure bound to B-to-A transition in target DNA. R-loop structures are a prevalent class of alternative non-B DNA structures (9). We acknowledge the current understanding of HIV-1 integration site selection and explore how R-loop interactions may contribute to this knowledge in the Discussion section of our manuscript. 

      Primarily based on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops, but we do not claim that R-loops completely replace dsDNA as the target for HIV-1 integration. An R-loop can be multi-kilobase in size and the R-loop peak length widely varies depending on the immunoprecipitation and library construction methods (1, 2), therefore, we could not validate whether the center of triple-stranded R-loops is the extraction site of HIV-1 integration where the strand transfer reaction by integrase occurs. Therefore, we replaced phrases such as, “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection, with phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. We quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      dsDNA may have been the sole target of the intasome demonstrated in vitro possibly because dsDNA has only been considered as a substrate for in vitro intasome assembly. We hope that our work will initiate and advance future investigations on target-bound intasome structures by considering R-loops as potential new targets for integrated proteins and intasomes.  

      (2.2) As noted above, concerted (two-ended) integration can occur in vitro on a naked dsDNA substrate. However, there is compelling evidence that, in cells, integration preferentially occurs on nucleosomes. Nucleosomes are not found in R loops. In an infected cell, the viral RNA genome of HIV is converted into DNA within the capsid/core which transits the nuclear pore before reverse transcription has been completed. Integration requires the uncoating of the capsid/core, which is linked to the completion of viral DNA synthesis in the nucleus. Two host factors are known to strongly influence integration site selection, CPSF6 and LEDGF. CPSF6 is involved in helping the capsid/core transit the nuclear pore and associate with nuclear speckles. LEDGF is involved in helping the preintegration complex (PIC) find an integration site after it has been released from the capsid/core, most commonly in the bodies of highly expressed genes. In the absence of an interaction of CPSF6 with the core, integration occurs primarily in the lamin-associated domains (LADs). Genes in LADs are usually not expressed or are expressed at low levels. Depending on the cell type, integration in the absence of CPSF6 can be less efficient than normal integration, but that could well be due to a lack of LEDGF (which is associated with expressed genes) in the LADs. In the absence of an interaction of IN with LEDGF (and in cells with low levels of HRP2) integration is less efficient and the obvious preference for integration in highly expressed genes is reduced. Importantly, LEDGF is known to bind histone marks, and will therefore be preferentially associated with nucleosomes, not R-loops. LEDGF fusions, in which the chromatin binding portion of the protein is replaced, can be used to redirect where HIV integrates, and that technique has been used to map the locations of proteins on chromatin. Importantly, LEDGF fusions in which the chromatin binding component of LEDGF is replaced with a module that recognizes specific histone marks direct integration to those marks, confirming integration occurs efficiently on nucleosomes in cells. It is worth noting that it is possible to redirect integration to portions of the host genome that are poorly expressed, which, when taken with the data on integration into LADs (integration in the absence of a CPSF6 interaction) shows that there are circumstances in which there is reasonably efficient integration of HIV DNA in portions of the genome in which there are few if any R-loops.

      Although R-loops may not wrap around nucleosomes, long and stable R-loops likely cover stretches of DNA corresponding to multiple nucleosomes (10). For example, R-loops are associated with high levels of histone marks, such as H3K36me3, which LEDGF recognizes (2, 11). R-loops dynamically regulate the chromatin architecture. Possibly by altering nucleosome occupancy, positioning, or turnover, R-loop structures relieve superhelical stress and are often associated with open chromatin marks and active enhancers (2, 10). These features are also distributed over HIV-1 integration sites (12). In the Discussion section of the revised manuscript, we explored the R-loop molding mechanisms in the host genomic environment for HIV-1 integration site selection and its potential collaborative role with LEDGF/p75 and CPSF6 governing HIV-1 integration site selection. 

      By carefully revising our original manuscript, with respect to the reviewer's comment, we recognized the need to tone down our statements. We found that the title, abstract, and discussion of our original manuscript includes phrases, such as, “HIV-1 targets Rloops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings.

      (2.3) Given that HIV DNA is known to preferentially integrate into expressed genes and that R-loops must necessarily involve expressed RNA, it is not surprising that there is a correlation between HIV integration and regions of the genome to which R loops have been mapped. However, it is important to remember that correlation does not necessarily imply causation.

      We understand the reviewer's concern regarding the possibility of a coincidental correlation between the R-loop regions and HIV-1 integration sites, particularly when the interpretation of this correlation is primarily based on a global analysis. 

      Therefore, we designed pgR-poor and pgR-rich cell lines, which we believe are suitable models for distinguishing between integration events driven by transcription and the presence of R-loops. Although the two cell lines showed comparable levels of transcription at the designated region upon DOX treatment via TRE promoter activation (Fig. 3B of the revised manuscript), only pgR-rich cells formed R-loops at the designated regions (Fig. 3C of the revised manuscript). When infected with HIV1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity after DOX treatment (Fig. 3D of the revised manuscript). Moreover, we quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E of the revised manuscript). Therefore, we concluded that transcriptional activation without an R-loop (in pgR-poor cells) may not be sufficient to drive HIV-1 integration. We believe that our work provides strong evidence for a causative relationship between R-loop formation/Rloop sites and HIV-1 integration. We hope that our explanation addresses your concerns. Thank you.

      If we consider some of the problems in the experiments that are described in the manuscript:

      (2.4) In an infected individual, cells are almost always infected by a single virion and the infecting virion is not accompanied by large numbers of damaged or defective virions. This is a key consideration: the claim that infection by HIV affects R-loop formation in cells was done with a VSVg vector in experiments in which there appears to have been about 6000 virions per cell. Although most of the virions prepared in vitro are defective in some way, that does not mean that a large fraction of the defective virions cannot fuse with cells. In normal in vivo infections, HIV has evolved in ways that avoid signaling infected the cell of its presence. To cite an example, carrying out reverse transcription in the capsid/core prevents the host cell from detecting (free) viral DNA in the cytoplasm. The fact that the large effect on R-loop formation which the authors report still occurs in infections done in the absence of reverse transcription strengthens the probability that the effects are due to the massive amounts of virions present, and perhaps to the presence of VSVg, which is quite toxic. To have physiological relevance, the infections would need to be carried out with virions that contain HIV even under circumstances in which there is at most one virion per cell.

      Our virus production and in vitro and ex vivo HIV-1 infection experimental conditions, designed for infecting cell types, such as HeLa cells and primary CD4+ T cells with VSV-G pseudotyped HIV, were based on a comprehensive review of numerous references. At the very beginning of this study, we tested HIV-1-specific host genomic R-loop induction using empty virion particles (virus-like particles, VLP) or other types of viruses (non-retrovirus, SeV; retroviruses, FMLV and FIV), all produced with a VSV G protein donor. We could not include a control omitting the VSV G protein or using natural HIV-1 envelope protein to prevent viral spread in culture. We observed that despite all types of virus stocks being prepared using VSV-G, only cells infected with HIV-1 viruses showed R-loop signal enrichment (Author response image 3). Therefore, we omitted the control for the VSV G protein in subsequent analyses, such as DRIPcseq. We have also revised our manuscript to provide a clearer description of the experimental conditions. In particular, we now clearly stated that we used VSV-G pseudotyped HIV-1 in this study, throughout the abstract, results, and discussion sections of the revised manuscript. Thank you.

      Author response image 3.

      (A) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected U2OS cells with MOI of 0.6 harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal). (B) Dot blot analysis of the R-loop in gDNA extracts from HeLa cells infected with 0.3 MOI of indicated viruses. The infected cells were harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal).

      HIV-1 co-infection may also be expected in cell-free HIV-1 infections. However, it was previously suggested that the average number of infection events varies within 1.02 to 1.65 based on a mathematical model that estimates the frequency of multiple infections with the same virus (Figure 4c of Ito et al., Sci. Rep, 2017; 6559) (13). 

      (2.5) Using the Sso7d version of HIV IN in the in vitro binding assays raises some questions, but that is not the real question/problem. The real problem is that the important question is not what/how HIV IN protein binds to, but where/how an intasome binds. An intasome is formed from a combination of IN bound to the ends of viral DNA. In the absence of viral DNA ends, IN does not have the same structure/organization as it has in an intasome. Moreover, HIV IN (even Sso7d, which was modified to improve its behavior) is notoriously sticky and hard to work with. If viral DNA had been included in the experiment, intasomes would need to be prepared and purified for a proper binding experiment. To make matters worse, there are multiple forms of multimeric HIV IN and it is not clear how many HIV INs are present in the PICs that actually carry out integration in an infected cell.

      As the reviewer has noted, HIV IN, even with Sso7d tagging, is difficult. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S), procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we were unable to purify the vDNA-bound IN protein complexes for in vitro assays. However, through multiple biochemical experiments, we believe that we have successfully demonstrated the interaction between cellular R-loops and HIV-1 integrase proteins both in cells and in vitro (Fig. 5A–F of the revised manuscript). We also observed a close association between integrase proteins and host cellular Rloops in HIV-1-infected cells, using a fluorescent recombinant virus (HIV-IN-EGFP) with intact IN-EGFP PICs (Fig. 5G of the revised manuscript). 

      (2.6) As an extension of comment 2, the proper association of an HIV intasome/PIC with the host genome requires LEDGF and the appropriate nucleic acid targets need to be chromatinized.

      The interaction between cellular R-loops and HIV-1 integrase proteins in HeLa cells endogenously expressing LEDGF/p75 was examined using reciprocal immunoprecipitation assays in Fig. 5C–F, S6B, and S6D Fig. of the revised manuscript. In addition, as discussed in more detail in our response to comment [28], we observed a close association between host cellular R-loops and HIV-1 integrase proteins by PLA assay, in HIV-1-infected HeLa cells. 

      (2.7) Expressing any form of IN, by itself, in cells to look for what IN associates with is not a valid experiment. A major factor that helps to determine both where integration takes place and the sites chosen for integration is the transport of the viral DNA and IN into the nucleus in the capsid core. However, even if we ignore that important part of the problem, the IN that the authors expressed in HeLa cells won't be bound to the viral DNA ends (see comment 2), even if the fusion protein would be able to form an intasome. As such, the IN that is expressed free in cells will not form a proper intasome/PIC and cannot be expected to bind where/how an intasome/PIC would bind.

      As discussed in more detail in our response to comment [2-8], we believe that our PLA experiment using the pVpr-IN-EGFP virus, which has previously been examined for virion integrity, as well as the IN-EGFP PICs (14), demonstrated a close association between host cellular R-loops and HIV-1 integrase proteins in HIV-1-infected cells. 

      (2.8) As in comment 1, for the PLA experiments presented in Figure 5 to work, the number of virions used per cell (which differs from the MOI measured by the number of cells that express a viral marker) must have a high, which is likely to have affected the cells and the results of the experiment. However, there is the additional question of whether the IN-GFP fusion is functional. The fact that the functional intasome is a complex multimer suggests that this could be a problem. There is an additional problem, even if IN-GFP is fully functional. During a normal infection, the capsid core will have delivered copies of IN (and, in the experiments reported here, the IN-GFP fusion) into the nucleus that is not part of the intasome. These "free" copies of IN (here IN-GFP) are not likely to go to the same sites as an intasome, making this experiment problematic (comment 4).

      The HIV-IN-EGFP virus stock was produced by polyethylenimine-mediated transfection of HEK293T cells with 6 µg of pVpr-IN-EGFP, 6 µg of HIV-1 NL4-3 noninfectious molecular clone (pD64E; NIH AIDS Reagent Program 10180), and 1 µg of pVSV-G as previously described in (14), and described in the Materials and Methods section of our manuscript. The pVpr-IN-EGFP vector used to produce HIV-1-IN-EGFP virus stock was provided by Anna Cereseto group (Albanese et al., PLOS ONE, 2008; 6(6); Ref 34 of the revised manuscript). It was previously reported that the HIV-1INEGFP virions produced by IN-EGFP trans-incorporation through Vpr are intact and infective viral particles (Figure 1 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that the HIV-IN-EGFP used in our PLA experiments was functional. 

      Additionally, Albanese et al. showed that the EGFP signal of HIV-IN-EGFP virions colocalizes with the viral protein matrix (p17MA) and capsid (P24CA) as well as with the newly synthesized cDNA produced by reverse transcriptase by labeling and visualizing the synthesized cDNA (14). In addition, the fluorescent recombinant virus (HIV-INEGFP) was structurally intact at the nuclear level (Figure 6 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that our PLA experimental result is not likely misled as the reviewer concerns due to the integrity of the HIV-IN-EGFP virion as well as IN-EGFP PICs.

      Furthermore, the in vitro HIV-1 infection setting of our PLA experiments was carefully determined based on multiple studies that performed image-based assays on HIV-1infected cells. For instance, Albanese et al. infected 4 × 104 cells with viral loads equivalent to 1.5 or 3 µg of HIV-1 p24 for their immunofluorescence analysis, in their previous report (14). We titrated the fluorescent HIV-1 virus stocks by examining both the multiplicity of infection (MOI) and quantifying the HIV-1 p24 antigen content (Author response image 4). In our calculation, we infected 5 × 104 HeLa cells with viral loads equivalent to 1.3 ug of HIV-1 p24, which is indicated as 2 MOI in Fig. 5G of our manuscript, for our PLA experiments. 

      Image-Based Assays often require increased and enhanced signal for statistical robustness. For example, Achuthan et al. infected cells with VSV-G-pseudotyped HIV1 at the approximate MOI of 350 for vDNA and PIC visualization (15). Therefore, we believe our experimental condition for PLA experiments, which we carefully designed based on previous study that are frequently referred, are reasonable. We really hope that our discussion sufficiently addressed the reviewer’s concern. 

      Author response image 4.

      Gating strategy used to determine HIV-1-infectivity in HeLa cells at 48 hpi. Cells were infected with a known p24 antigen content in the stock of the VSV-G-pseudotyped HIV-1-EGFP-virus. The percentages of GFP-positive cell population are indicated.

      (2.9) In the Introduction, the authors state that the site of integration affects the probability that the resulting provirus will be expressed. Although this idea is widely believed in the field, the actual data supporting it are, at best, weak. See, for example, the data from the Bushman lab showing that the distribution of integration sites is the same in cells in which the integrated proviruses are, and are not, expressed. However, given what the authors claim in the introduction, they should be more careful in interpreting enzyme expression levels (luciferase) as a measure of integration efficiency in experiments in which they claim proviruses are integrated in different places.

      We thank the reviewer for the constructive comment. We have changed the statement in Lines 41–42 in the Introduction section of our original manuscript to “The chromosomal landscape of HIV-1 integration influences proviral gene expression, persistence of integrated proviruses, and prognosis of antiretroviral therapy.” (Lines 39-41 of the revised manuscript). We believe that this change can tone-down the relevance between the site of integration and the provirus expression level.

      The piggyBac transposase randomly insert the “cargo (transposon)” into TTAA chromosomal sites of the target genome, generating efficient insertions at different genomic loci (16, 17). We believe that this random insertion of the pgR-poor/rich vector mediated by the piggyBac system allows us not to mislead the R-loop-mediated HIV1 integration site because of the genome locus bias of the vector insertion. Therefore, Figure 3 in our manuscript does not claim any relevance between the site of integration and the resulting provirus expression levels. Instead, as noted in Line 214 of the revised manuscript, using the luciferase reporter HIV-1 virus, we attempted to examine HIV-1 infection in cells with an "extra number of R-loops” in the host cellular genome. We observed that pgR-rich cells showed higher luciferase activity upon DOX treatment than pgR-poor cells (Fig. 3D of the revised manuscript). We believe that this is because a greater number of HIV-1 integration events may occur in pgR-rich cells, where DOX-inducible de novo R-loop regions are introduced. This has been further examined in Fig. 3E–G of the revised manuscript. We hope this explanation clarifies the Figure 3. Thank you. 

      (2.10) Using restriction enzymes to create an integration site library introduces biases that derive from the uneven distribution of the recognition sites for the restriction enzymes.

      As described in the Materials and Methods section, we adopted a sequencing library construction method using a previously established protocol (18, 19). Although we recognize the advantages of DNA fragmentation by sonication, in in vitro or ex vivo HIV-1 infection settings, where the multiplicity of infection is carefully determined based on multiple references, more copies of integrated viral sequences are expected compared to that in samples from infected patients (18). Therefore, in these settings, restriction enzyme-based DNA fragmentation and ligation-mediated PCR sequencing are well-established methods that provide significant data sources for HIV-1 integration site sequencing (15, 20-22). Furthermore, our data showing the proportion of integration sites over R-loop regions (Fig. 4B of the revised manuscript) are presented alongside the respective random controls (i.e., proportion of integration sites within the 30-kb windows centered on randomized DRIPc-seq peaks, gray dotted lines; control comparisons between randomized integration sites with DRIPc-seq peaks, black dotted lines; and randomized integration sites with randomized DRIPcseq peaks, gray solid lines), which do not show such a correlation between the HIV-1 integration sites and nearby areas of the R-loop regions. Therefore, we believe that our results from the integration site sequencing data analysis are unlikely to be biased. 

      Reviewer #3 (Public Review):

      In this manuscript, Park and colleagues describe a series of experiments that investigate the role of R-loops in HIV-1 genome integration. The authors show that during HIV-1 infection, R-loops levels on the host genome accumulate. Using a synthetic R-loop prone gene construct, they show that HIV-1 integration sites target sites with high R-loop levels. They further show that integration sites on the endogenous host genome are correlated with sites prone to R-loops. Using biochemical approaches, as well as in vivo co-IP and proximity ligation experiments, the authors show that HIV-1 integrase physically interacts with R-loop structures.

      My primary concern with the paper is with the interpretations the authors make about their genome-wide analyses. I think that including some additional analyses of the genome-wide data, as well as some textual changes can help make these interpretations more congruent with what the data demonstrate. Here are a few specific comments and questions:

      We are grateful for the time and effort we spent on our behalf and the reviewer’s appreciation for the novelty of our work, in particular, R-loop induction by HIV-1 infection and the correlation between host R-loops and the genomic site of HIV-1 integration. In the following sections, we provide our responses to your comments and suggestions. Your comments are in italics. We have carefully addressed the following issues.

      (3.1) I think Figure 1 makes a good case for the conclusion that R-loops are more easily detected HIV-1 infected cells by multiple approaches (all using the S9.6 antibody). The authors show that their signals are RNase H sensitive, which is a critical control. For the DRIPc-Seq, I think including an analysis of biological replicates would greatly strengthen the manuscript. The authors state in the methods that the DRIPc pulldown experiments were done in biological replicates for each condition. Are the increases in DRIPc peaks similar across biological replicates? Are genomic locations of HIV-1-dependent peaks similar across biological replicates? Measuring and reporting the biological variation between replicate experiments is crucial for making conclusions about increases in R-loop peak frequency. This is partially alleviated by the locus-specific data in Figure S3A. However, a better understanding of how the genome-wide data varies across biological replicates will greatly enhance the quality of Figure 1.

      DRIPc-seq experiments were conducted with two biological replicates. To define consensus DRIPc-seq peaks using these two replicates, we used two methods applicable to ChIP-seq analysis: the irreproducible discovery rate (IDR) method and sequencing data pooling. We found that the sequencing data pooling method yielded significantly more DRIPc-seq peaks than consensus peak identification through IDR, and we decided to utilize R-loop peaks from pooled sequencing data for our downstream analyses, as described in the figure legends and Materials and Methods of the revised manuscript. 

      As noted by the reviewer, it is important to verify whether the increasing trend in the number of R-loop peaks and genomic locations of HIV-1 dependent R-loops were consistently observed across the two biological replicates. Therefore, we independently performed R-loop calling on each replicate of the sequencing data of primary CD4+ T cells from two individual donors to verify that the increase in R-loop numbers was consistent (Author response image 5). Additionally, the overlap of the R-loop peaks between the two replicates was statistically significant across the genome (Author response table 1). Thank you.

      Author response image 5.

      Bar graph indicating DRIPc-seq peak counts for HIV-1-infected primary CD4+ T cells harvested at the indicated hours post infection (hpi). Pre-immunoprecipitated samples were untreated (−) or treated (+) with RNase H, as indicated. Each dot corresponds to an individual data set from two biologically independent experiments.

      Author response table 1.

      DRIPc-seq peak length and Chi-square p-value in CD4+ T cells from individual donor 1 and 2 

      (3.2) I think that the conclusion that R-loops "accumulate" in infected cells is acceptable, given the data presented. However, in line 134 the authors state that "HIV1 infection induced host genomic R-loop formation". I suggest being very specific about the observation. Accumulation can happen by (a) inducing a higher frequency of the occurrence of individual R-loops and/or (b) stabilizing existing R-loops. I'm not convinced the authors present enough evidence to claim one over the other. It is altogether possible that HIV-1 infection stabilizes R-loops such that they are more persistent (perhaps by interactions with integrase?), and therefore more easily detected. I think rephrasing the conclusions to include this possibility would alleviate my concerns.

      We thank the reviewer for the considerable discussion on our manuscript. We have now changed Line 134 to, “HIV-1 infection induces host genomic R-loop enrichment” (Lines 132-133 of the revised manuscript), and added a new conclusion sentence implicating the possible explanation for the R-loop signal enrichment upon HIV-1 infection (Lines 133–135 of the revised manuscript), according to the reviewer's suggestion.    

      (3.3) A technical problem with using the S9.6 antibody for the detection of R-loops via microscopy is that it cross-reacts with double-stranded RNA. This has been addressed by the work of Chedin and colleagues (as well as others). It is absolutely essential to treat these samples with an RNA:RNA hybrid-specific RNase, which the authors did not include, as far as their methods section states. Therefore, it is difficult to interpret all of the immunofluorescence experiments that depend on S9.6 binding.

      We understand the reviewer's concern regarding the cross-reactivity of the S9.6 antibody with more abundant dsRNA, particularly in imaging applications. We carefully designed the experimental and analytical methods for R-loop detection using microscopy. For example, we pre-extracted the cytoplasmic fraction before staining with the S9.6 antibody and quantified the R-loop signal by subtracting the nucleolar signal. Both of these steps were taken to eliminate the possibility of misdetecting Rloops via microscopy because of the prominent cytoplasmic and nucleolar S9.6 signals, which primarily originate from ribosomal RNA. In addition, we included R-loop negative control samples in our microscopy analysis that were subjected to intensive RNase H treatment (60U/mL RNase H for 36 h) and observed a significant reduction in the S9.6 signal (Figure 1E of the revised manuscript). RNase H-treated samples served as essential and widely accepted negative controls for R-loop detection. 

      We would like to point out that recent studies have reported strong intrinsic specificity of S9.6 anybody for DNA:RNA hybrid duplex over dsDNA and dsRNA, along with the structural elucidations of S9.6 antibody recognition of hybrids (23, 24). Therefore, our interpretation of host cellular R-loop enrichment after HIV-1 infection using S9.6 antibodies in multiple biochemical approaches is well supported. Nevertheless, we agree with the reviewer's opinion that additional negative controls for the detection of R-loops via microscopy, such as RNase T1-and RNase III-treated samples, could improve the robustness and accuracy of R-loop imaging data (25).  

      (3.4) Given that there is no clear correlation between expression levels and R-loop peak detection, combined with the data that show increased detection of R-loop frequency in non-genic regions, I think it will be important to show that the R-loop forming regions are indeed transcribed above background levels. This will help alleviate possible concerns that there are technical errors in R-loop peak detection.

      Figures S5D and S5E in the revised manuscript show the relative gene expression levels of the R-loop-forming positive regions (P1-3) and the referenced Rloop-positive loci (RPL13A and CALM3). The gene expression levels of these R-loopforming regions were significantly higher than those of the ECFP or mAIRN genes without DOX treatment, which can be considered background levels of transcription in cells. Thank you. 

      (3.5) In Figures 4C and D the hashed lines are not defined. It is also interesting that the integration sites do not line up with R-loop peaks. This does not necessarily directly refute the conclusions (especially given the scale of the genomic region displayed), but should be addressed in the manuscript. Additionally, it would greatly improve Figure 4 to have some idea about the biological variation across replicates of the data presented 4A.

      We thank the reviewer for the considerable comment on our study. First of all, we added an annotation for the dashed lines in the figure legends of Figures 4C and 4D in the revised manuscript.

      We agree with the reviewer's interpretation of the relationship between the integration sites and R-loop peaks. Primarily based on our current data, we believe R-loop structures are bound by HIV-1 integrase proteins and lead HIV-1 viral genome integration into the “vicinity” regions of the host genomic R-loops. We displayed a large-scale genomic region (30-kb windows) to present integration sites surrounding R-loop centers because an R-loop can be multi-kilobase in size (1, 2). Depending on the immunoprecipitation and library construction methods, the R-loop peaks varied in size, and the peak length showed a wide distribution (Figure 3B of Malig et al., 2020, Figure 1B of Sanz et al., 2016, and Figure 2A of the revised manuscript). Therefore, presenting integration site events within a wide window of R-loop peaks could be more informative and better reflect the current understanding of R-loop biology.

      R-loop formation recruits diverse chromatin-binding protein factors, such as H3K4me1, p300, CTCF, RAD21, and ZNF143 (Figure 6A and 6B of Sanz et al., 2016) (26), which allow R-loops to exhibit enhancer and insulator chromatin states, which can act as distal regulatory elements (26, 27). We have demonstrated physical interactions between host cellular R-loops and HIV-1 integrase proteins (Figure 5 of the revised manuscript), therefore, we believe that this ‘distal regulatory element-like feature’ of the R-loop can be a potential explanation for how R-loops drive integration over longrange genomic regions.

      According to your suggestion, we added this explanation to the relevant literature in the Discussion section of the revised manuscript.

      Author response image 6 which represents the biological variation across replicates of the data shown in Figure 4A. The integration site sequencing data for Jurkat cells were adopted from SRR12322252 (4), which consists of the integration site sequencing data of HIV-1-infected wild type Jurkat cells with one biological replicate. We hope that our explanations and discussion have successfully addressed your concerns. Thank you. 

      Author response image 6.

      Bar graphs showing the quantified number of HIV-1 integration sites per Mb pair in total regions of 30-kb windows centered on DRIPc-seq peaks from HIV-1 infected HeLa cells and primary CD4+ T cells (magenta) or non-R-loop region in the cellular genome (gray). Each dot corresponds to an individual data set from two biologically independent experiments.

      (3.6) The authors do not adequately describe the Integrase mutant that they use in their biochemical experiments in Figure 5A. Could this impact the activity of the protein in such a way that interferes with the interpretation of the experiment? The mutant is not used in subsequent experiments for Figure 5 and so even though the data are consistent with each other (and the conclusion that Integrase interacts with R-loops) a more thorough explanation of why that mutant was used and how it impacts the biochemical activity of the protein will help the interpretation of the data presented in Figure 5.

      We appreciate the reviewer’s suggestions. In our EMSA analysis, we purified and used Sso7d-tagged HIV-1 integrase proteins with an active-site amino acid substitution, E152Q. First, we used the Sso7d-tagged HIV-1 integrase protein, as it has been suggested in previous studies that the fusion of small domains, such as Sso7d (DNA binding domain) can significantly improve the solubility of HIV integrase proteins without affecting their ability to assemble with substrate nucleic acids and their enzymatic activity (Figure 1B of Li et al., PLOS ONE, 2014;9 (8) (28, 29). We used an integrase protein with an active site amino acid substitution, E152Q, in our mobility shift assay, because the primary goal of this experiment was to examine the ability of the protein to bind or form a complex with different nucleic acid substrates. We thought that abolishing the enzymatic activity of the integrase protein, such as 3'-processing that cleaves DNA substrates, would be more appropriate for our experimental objective. This Sso7d tagged- HIV-1 integrase with the E152Q mutation has also been used to elucidate the structural model of the integrase complex with a nucleic acid substrate by cryo-EM (3) and has been shown to not disturb substrate binding.   Based on the reviewer’s comments, we have added a description of the E152Q mutant integrase protein in Lines 268–270 of the revised manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      The paper suffers from many grammatical errors, which sometimes interfere with the interpretations of the experiments. In the view of this reviewer, the manuscript must be carefully revised prior to publication. For example, lines 247-248 "Intasomes consist of HIV-1 viral cDNA and HIV-1 coding protein, integrases." It is unclear from this sentence whether there are multiple integrases or multiple proteins that interact with the viral genome to facilitate integration. This makes the subsequent experiments in Figure 5 difficult to interpret. There are many other examples, too numerous to point out individually.

      We thoughtfully revised the original manuscript, making the best efforts to provide clearer details of our findings. We believe that we have made substantial changes to the manuscript, including Lines 247–248 of the original manuscript that the reviewer noted. Furthermore, the revised manuscript was edited by a professional editing service. Thank you.     (1) M. Malig, S. R. Hartono, J. M. Giafaglione, L. A. Sanz, F. Chedin, Ultra-deep Coverage Singlemolecule R-loop Footprinting Reveals Principles of R-loop Formation. J Mol Biol 432, 22712288 (2020).

      (2) L. A. Sanz et al., Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol Cell 63, 167-178 (2016).

      (3) D. O. Passos et al., Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome. Science 355, 89-92 (2017).

      (4) W. Li et al., CPSF6-Dependent Targeting of Speckle-Associated Domains Distinguishes Primate from Nonprimate Lentiviral Integration. mBio 11,  (2020).

      (5) P. A. Ginno, Y. W. Lim, P. L. Lott, I. Korf, F. Chedin, GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res 23, 1590-1600 (2013).

      (6) S. Hamperl, M. J. Bocek, J. C. Saldivar, T. Swigut, K. A. Cimprich, Transcription-Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA Damage Responses. Cell 170, 774-786 e719 (2017).

      (7) H. O. Ajoge et al., G-Quadruplex DNA and Other Non-Canonical B-Form DNA Motifs Influence Productive and Latent HIV-1 Integration and Reactivation Potential. Viruses 14,  (2022).

      (8) I. K. Jozwik et al., B-to-A transition in target DNA during retroviral integration. Nucleic Acids Res 50, 8898-8918 (2022).

      (9) F. Chedin, C. J. Benham, Emerging roles for R-loop structures in the management of topological stress. J Biol Chem 295, 4684-4695 (2020).

      (10) F. Chedin, Nascent Connections: R-Loops and Chromatin Patterning. Trends Genet 32, 828838 (2016).

      (11) P. B. Chen, H. V. Chen, D. Acharya, O. J. Rando, T. G. Fazzio, R loops regulate promoterproximal chromatin architecture and cellular differentiation. Nat Struct Mol Biol 22, 9991007 (2015).

      (12) A. R. Schroder et al., HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521-529 (2002).

      (13) Y. Ito et al., Number of infection events per cell during HIV-1 cell-free infection. Sci Rep 7, 6559 (2017).

      (14) A. Albanese, D. Arosio, M. Terreni, A. Cereseto, HIV-1 pre-integration complexes selectively target decondensed chromatin in the nuclear periphery. PLoS One 3, e2413 (2008).

      (15) V. Achuthan et al., Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration. Cell Host Microbe 24, 392-404 e398 (2018).

      (16) X. Li et al., piggyBac transposase tools for genome engineering. Proc Natl Acad Sci U S A 110, E2279-2287 (2013).

      (17) Y. Cao et al., Identification of piggyBac-mediated insertions in Plasmodium berghei by next generation sequencing. Malar J 12, 287 (2013).

      (18) E. Serrao, P. Cherepanov, A. N. Engelman, Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites. J Vis Exp,  (2016).

      (19) K. A. Matreyek et al., Host and viral determinants for MxB restriction of HIV-1 infection. Retrovirology 11, 90 (2014).

      (20) G. A. Sowd et al., A critical role for alternative polyadenylation factor CPSF6 in targeting HIV-1 integration to transcriptionally active chromatin. Proc Natl Acad Sci U S A 113, E10541063 (2016).

      (21) B. Lucic et al., Spatially clustered loci with multiple enhancers are frequent targets of HIV-1 integration. Nat Commun 10, 4059 (2019).

      (22) P. K. Singh, G. J. Bedwell, A. N. Engelman, Spatial and Genomic Correlates of HIV-1 Integration Site Targeting. Cells 11,  (2022).

      (23) C. Bou-Nader, A. Bothra, D. N. Garboczi, S. H. Leppla, J. Zhang, Structural basis of R-loop recognition by the S9.6 monoclonal antibody. Nat Commun 13, 1641 (2022).

      (24) Q. Li et al., Cryo-EM structure of R-loop monoclonal antibody S9.6 in recognizing RNA:DNA hybrids. J Genet Genomics 49, 677-680 (2022).

      (25) J. A. Smolka, L. A. Sanz, S. R. Hartono, F. Chedin, Recognition of RNA by the S9.6 antibody creates pervasive artifacts when imaging RNA:DNA hybrids. J Cell Biol 220,  (2021).

      (26) L. A. Sanz, F. Chedin, High-resolution, strand-specific R-loop mapping via S9.6-based DNARNA immunoprecipitation and high-throughput sequencing. Nat Protoc 14, 1734-1755 (2019).

      (27) M. Merkenschlager, D. T. Odom, CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 1285-1297 (2013).

      (28) M. Li, K. A. Jurado, S. Lin, A. Engelman, R. Craigie, Engineered hyperactive integrase for concerted HIV-1 DNA integration. PLoS One 9, e105078 (2014).

      (29) M. Li et al., A Peptide Derived from Lens Epithelium-Derived Growth Factor Stimulates HIV1 DNA Integration and Facilitates Intasome Structural Studies. J Mol Biol 432, 2055-2066 (2020).

  2. Oct 2024
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      […] Strengths:

      The study has several important strengths: (i) the work on GDA stability and competition of GDA with point mutations is a very promising area of research and the authors contribute new aspects to it, (ii) rigorous experimentation, (iii) very clearly written introduction and discussion sections. To me, the best part of the data is that deletion of lon stimulates GDA, which has not been shown with such clarity until now.

      Weaknesses:

      The minor weaknesses of the manuscript are a lack of clarity in parts of the results section (Point 1) and the methods (Point 2).

      We thank the reviewer for their comments and suggestions on our manuscript. We also appreciate the succinct summary of key findings that the Reviewer has taken cognisance of in their assessment, in particular the association of the Lon protease with the propensity for GDAs as well as its impact on their eventual fate. Going ahead, we plan to revise the manuscript for greater clarity as suggested by Reviewer #1.

      Reviewer #2 (Public review):

      […] The study does what any bold and ambitious study should: it contains large claims and uses multiple sorts of evidence to test those claims.

      Weaknesses:

      While the general argument and conclusion are clear, this paper is written for a bacterial genetics audience that is familiar with the manner of bacterial experimental evolution. From the language to the visuals, the paper is written in a boutique fashion. The figures are even difficult for me - someone very familiar with proteostasis - to understand. I don't know if this is the fault of the authors or the modern culture of publishing (where figures are increasingly packed with information and hard to decipher), but I found the figures hard to follow with the captions. But let me also consider that the problem might be mine, and so I do not want to unfairly criticize the authors.

      For a generalist journal, more could be done to make this study clear, and in particular, to connect to the greater community of proteostasis researchers. I think this study needs a schematic diagram that outlines exactly what was accomplished here, at the beginning. Diagrams like this are especially important for studies like this one that offer a clear and direct set of findings, but conduct many different sorts of tests to get there. I recommend developing a visual abstract that would orient the readers to the work that has been done.

      Next, I will make some more specific suggestions. In general, this study is well done and rigorous, but doesn't adequately address a growing literature that examines how proteostasis machinery influences molecular evolution in bacteria.

      While this paper might properly test the authors' claims about protein quality control and evolution, the paper does not engage a growing literature in this arena and is generally not very strong on the use of evolutionary theory. I recognize that this is not the aim of the paper, however, and I do not question the authors' authority on the topic. My thoughts here are less about the invocation of theory in evolution (which can be verbose and not relevant), and more about engagement with a growing literature in this very area.

      The authors mention Rodrigues 2016, but there are many other studies that should be engaged when discussing the interaction between protein quality control and evolution.

      A 2015 study demonstrated how proteostasis machinery can act as a barrier to the usage of novel genes: Bershtein, S., Serohijos, A. W., Bhattacharyya, S., Manhart, M., Choi, J. M., Mu, W., ... & Shakhnovich, E. I. (2015). Protein homeostasis imposes a barrier to functional integration of horizontally transferred genes in bacteria. PLoS genetics, 11(10), e1005612

      A 2019 study examined how Lon deletion influenced resistance mutations in DHFR specifically: Guerrero RF, Scarpino SV, Rodrigues JV, Hartl DL, Ogbunugafor CB. The proteostasis environment shapes higher-order epistasis operating on antibiotic resistance. Genetics. 2019 Jun 1;212(2):565-75.

      A 2020 study did something similar: Thompson, Samuel, et al. "Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme." Elife 9 (2020): e53476.

      And there's a new review (preprint) on this very topic that speaks directly to the various ways proteostasis shapes molecular evolution:

      Arenas, Carolina Diaz, Maristella Alvarez, Robert H. Wilson, Eugene I. Shakhnovich, C. Brandon Ogbunugafor, and C. Brandon Ogbunugafor. "Proteostasis is a master modulator of molecular evolution in bacteria."

      I am not simply attempting to list studies that should be cited, but rather, this study needs to be better situated in the contemporary discussion on how protein quality control is shaping evolution. This study adds to this list and is a unique and important contribution. However, the findings can be better summarized within the context of the current state of the field. This should be relatively easy to implement.

      We thank the reviewer for their encouraging assessment of our manuscript. We appreciate that the manuscript may not be accessible for a general readership in its present form. We plan to revise the manuscript, in part by modifying figures and adding schematics, to afford greater clarity. We also appreciate the concern regarding situating this study in the context of other published work that relates proteostasis and molecular evolution. Indeed, this was a particularly difficult aspect for us given the different kinds of literature that were needed to make sense of our study. We plan on revising the manuscript by incorporating the references that the Reviewer has pointed out.

      Reviewer #3 (Public review):

      […] Strengths:

      The major strength of this paper is identifying an example of antibiotic resistance evolution that illustrates the interplay between the proteolytic stability and copy number of an antibiotic target in the setting of antibiotic selection. If the weaknesses are addressed, then this paper will be of interest to microbiologists who study the evolution of antibiotic resistance.

      Weaknesses:

      Although the proposed mechanism is highly plausible and consistent with the data presented, the analysis of the experiments supporting the claim is incomplete and requires more rigor and reproducibility. The impact of this finding is somewhat limited given that it is a single example that occurred in a lon strain and compensatory mutations for evolved antibiotic resistance mechanisms are described. In this case, it is not clear that there is a functional difference between the evolution of copy number versus any other mechanism that meets a requirement for increased "expression demand" (e.g. promoter mutations that increase expression and protein stabilizing mutations).

      We thank the reviewer for their in-depth assessment of our work and appreciate their concerns regarding reproducibility and rigor in analysis of our data. We will incorporate this feedback and provide the necessary clarifications in the revised version of our manuscript.

    1. Author Response:

      We would like thank reviewers for your comprehensive and insightful reviews of our manuscript. We highly value your constructive comments and suggestions and are preparing revisions that will enhance both the clarity and robustness of our study. Below is an outline of the changes we will implement in response to the points you raised.

      All three reviewers expressed concerns regarding the robustness of our conclusions about the relationship between task-related theta activity and aperiodic changes. We will revise the manuscript to present these conclusions more cautiously, stating that the findings indicate a potential contribution of aperiodic activity to what is traditionally interpreted as theta activity. While our results emphasize the importance of distinguishing between periodic and aperiodic components, further research is necessary to fully understand this relationship. We will conduct additional control analyses, including a comparison of the scalp topographies of theta and aperiodic components, to better understand the relationship between aperiodic and periodic (theta) activity.

      In response to Reviewer #1's request for greater transparency in our reporting of methodological details, we will provide key clarifications. We will add a clear statement noting that the primary results are based on data from middle-aged to older adults, some of whom had subjective cognitive complaints (SCC). However, it is important to note that no differences were observed between the SCC group and the control group regarding periodic or aperiodic changes in power. Additionally, the main findings were replicated in a sample of middle-aged adults.

      To address potential confounding factors, we will include an analysis contrasting response-related ERPs with the identified aperiodic components. However, we do not entirely agree with the assertion that this will necessarily clarify the results. ERPs are not inherently distinct from aperiodic (or periodic) activity; they may reflect changes in aperiodic (or periodic) power. In our view, examining aperiodic and periodic power, ERPs, or time-frequency decomposition with baseline correction provides different perspectives on the same data. Nonetheless, the combined analyses and their results are intended to guide future researchers toward the most suitable approach for interpreting this data.

      Reviewer #3 raised concerns regarding the task's effectiveness in evoking theta power and the ability of spectral parameterization method (specparam) to adequately quantify background activity around theta bursts. To address these concerns, we will include additional visualizations demonstrating that the task reliably elicited theta (and delta) activity. Regarding the reviewer's concerns about specparam and theta bursts, it is important to clarify that specparam, in the form we used, does not incorporate time information; rather, it can be applied to any power spectral density (PSD), independent of how the PSD is derived. Specparam’s performance depends on the methods used to estimate frequency content. For time-frequency decomposition, we employed superlets (https://doi.org/10.1038/s41467-020-20539-9), which have been shown to resolve short bursts of activity more effectively than other methods. To our knowledge, superlets provide the highest resolution in terms of both time and frequency. Moreover, to improve stability, we performed spectral parameterization on trial-averaged power (in contrast to the approach in https://doi.org/10.7554/eLife.77348). Nonetheless, we will conduct a simulation to test whether specparam can reliably resolve low-frequency peaks over the 1/f activity.

      Reviewer #2 suggested that the manuscript would benefit from a more detailed account of the effects. In response, we will include more detailed quantifications of the analyzed effects, such as model error and R² values.

      We believe that the planned revisions will strengthen the manuscript and address the primary concerns raised by the reviewers. We sincerely appreciate your thoughtful feedback and look forward to submitting an improved version of the manuscript soon.

      Once again, thank you for your time and expertise in reviewing our work.

      Sincerely,

      Andraž Matkovič & Tisa Frelih

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate reviewer 2 comments with both insightful and clearly evaluated assessments of this study that include, much appreciated reframing and evaluation of the study’s advances in the sleep field. It is a constructive review and provides considerable added value to this study in better defining the biological significance of the findings, including both advances and limitations.  

      Reviewer 2 nicely summarized the work as “…highlight(ing) the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons.”. The reviewer succinctly placed one of the main electrophysiological findings in context of one of the sleep field’s most prevalent views, “that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity.” It has been speculated that “This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon (and its restoration by recovery sleep) by introducing the concept of silent synapses.” We want to emphasize that sleep need and its resolution involves more than just homeostasis of excitatory synaptic strength but may also be extended to include homeostasis of excitatory synaptic potential to undergo LTP (a homeostasis of meta-plasticity), with implications for learning and memory.   

      Reviewer 2 also identified another advance made by this study, summarized as, “The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies.” References for these studies are nicely provided by the reviewer. Our analysis of this data extends the evidence for transcriptional sleep-need-driven changes, observed by us and others in excitatory neurons to more particularly involve the excitatory neurons in layers 2-5, targeting  intra-telencephalic neurons.  

      Reviewer 2, importantly noted, “New snRNAseq analysis indicates that SD drives the expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function”, and that “SD-induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes”. These comments are well appreciated as they emphasize that beyond identification of the major target cell type of sleep function, the major sleep-target, gene-ontological characteristics are starting to be addressed.

      Reviewer 2 commented on the molecular sleep model, making a key observation that “SDinduced gene expression in excitatory neurons overlaps with genes regulated by the transcription factor MEF2C and HDAC4/5 (Figure 4),” and accurately discusses the significance with respect to the proposed model.

      We are in complete agreement with the observation that the molecular sleep model presented is not “definitively supported by the new data and in this regard should be viewed as a perspective…”. One of the more glaring gaps in supporting evidence is the absence of understanding of the role of HDAC4/5 (part of the SIK3-HDAC4/5 pathway) in sleep need modulation of excitatory synapses. Resolution of this issue might be approached by assessment of the synaptic effects of constitutively nuclear HDAC4/5. The current study provides a first step in the assessment by showing a correlation between HDAC4/5 and MEF2c target genes and a subset of differentially expressed synaptic shaping component (SSC) genes that modulate excitatory synapse strength and phenotype. However, the functional studies have yet to be completed. Complimentary studies on SD-induced SSC-DEGs (identified in this study) are also needed for follow-up characterization of their sleep need induced functional impact (both strength and meta-plasticity modulation) on the most relevant excitatory synapses (as identified in the current study).

      We agree with both reviewers 1 and 2 that, “Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity”. Reviewer 2 clarifies the key unresolved issue as, “cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022). Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020)”. One may conclude with reviewer 2, “These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes.”

      An understanding of the mechanism(s) responsible for the relationship between sleep need and SWA are critical to the evaluation of sleep need’s correlation with sleep DEGs and synaptic transmission, including “additional factors” as suggested by reviewer 2. SWA might result from a decrease of cortical glutamatergic neurotransmission below some threshold, which might occur in response to prolonged waking (possibly in response to waking activity-induced local increases of adenosine?), rather than being a cause of, or, being intimately involved in resolving sleep need.  

      An increase of SWA in association with SD can result directly from an acute SD-induced increase in local adenosine concentration. This will elicit an ADORA1-mediated down-regulation of glutamate excitatory neurotransmission in the cortex (Bjorness et al., 2016) and in cholinergic arousal centers (Rainnie et al., 1994; Porkka-Heiskanen et al., 1997; Portas et al., 1997; Li et al., 2023). When MEF2c is derepressed by chronic loss of HDAC4 function, SWA is facilitated (Kim et al., 2022). It is plausible that loss of HDAC4 function contributes to the increased SWA by downscaling glutamate excitatory transmission (independent of sleep need). This is expected to result from derepressed, MEF2c mediated sleep-gene expression.  

      Similarly, over-expression of constitutively active HDAC4 (cnHD4) can contribute to chronic upscaling of cortical glutamate synaptic strength to depress SWA (again, independent of sleep need). Thus, facilitation or depression of SWA correlates with up or down scaling effects on cortical glutamate neurotransmission, respectively, even in the absence of  a direct effects on sleep need (Figure 4D). Many reagents that reduce the excitability of glutamate pyramidal cells by various mechanisms, including anesthetics like isoflurane, barbiturates or benzodiazepines in addition to those activating ADORA1, increase SWA. Finally, it is important to acknowledge that direct evidence for this proposed link of SWA to cortical glutamate transmission remains in need of further investigation. Thus, SWA may reflect generalized cortical glutamate synaptic activity whether modulated by sleep function or by other agents.

      Still, other factors that can have a role mediating some of the mis-match between cnHD4/5 DEGs and Mef2c-cKO DEGs, include the broader over-expression of AAV-cnHD4 compared to CamKII- driven Cre KO of Mef2c. The cnHD4 overexpression can increase arousal center activity in the hypothalamus and other arousal areas to interfere with SWA, but not to the exclusion of SD-DEG repression resulting from a repression of MEF2c-mediated sleep gene expression.

      The critique by reviewer 1 raises a number of important technical issues with this study. A key, potentially critical issue raised by reviewer 1, is that of our method of experimental sleep deprivation (ESD). The reviewer suggests that “…neuronal activity/induction of plasticity”, peculiar to the ESD methodology employed in this study, “…rather than sleep/wake states are responsible for the observed results…”.  

      In this study, a slow-moving treadmill (SMTM; 0.1km/hour, as stated in the methods), requiring locomotion to avoid bumping into the backwall of a false bottomed plexiglass cage was used to induce ESD. A mouse, in its home cage, typically moves much faster than 0.1km/hour and the mouse is able to eat and drink freely while in the cage (see file: video 1). Furthermore, our observations using a beam-break cage, indicate that mice spontaneously travel for comparable to longer distances over 6 hours than the treadmill moves (during the ESD of 6 hours). Finally, our EEG recordings of mice on the active treadmill show 100% waking while it is on (Bjorness et al., 2009), whereas prevention of NREM sleep (including transition time) using the “gentle handling”  (GH) technique occurs depending on the diligence of the experimenter.  

      The accommodation (one week prior to ESD) included exposure to the treadmill-on for 30minutes ~ZT=2 & ZT= 14 hours (now spelled out in the “Materials & Methods” section). Thus, the likelihood of motor learning seems vanishingly small.  

      As with all ESD methods, there must be some associated increase in sensory and motor neuronal activity to drive arousal and prevent transition to sleep. For example, the more widely employed GH method of ESD involves sensory stimulation (tactile and or auditory) of sufficient intensity to induce postural change from that associated with sleep to that associated with wake (often involving some locomotion). Like the SMTM, both sensory and motor systems are likely to be engaged. Unlike the SMTM method, the stimulation used in GH is variably-intermittent from mouse to mouse and from experimenter to experimenter as it is applied only when the experimenter judges the mouse to be falling asleep. . It can even be argued that the varied and unpredictable ways in which these interactions happen cause plastic changes with a higher likelihood than the constant slow motion of a treadmill – the mice know how to walk, after all. In other protocols, novel objects are introduced to the animals – those will certainly trigger plastic processes –something that is avoided using a slow-running treadmill to which the mouse has been accommodated, for sleep deprivation.  

      The changes induced by SMTM technique are reproducible and induce arousal by somatic stimulation of sufficient intensity to induce natural motor activity as with GH. All ESD methods induce motor activity and it is reasonable to speculate that induced, motor activity is essential for effective ESD for the prolonged durations (>4 hours in mice) that elicit high sleep need. Electrophysiological assessment of SD-evoked increases in mEPSC amplitude and frequency using GH-ESD (Liu et al., 2010) are similar in all respects to our observations of the response to SMTMESD (Bjorness et al., 2020). Further studies might directly address a comparison of SMTM-ESD to GH-ESD as suggested by reviewer 1 but are regrettably outside the scope and resources of our study.

      The model presented in Figure 4C is consistent with the experimental findings with respect to the observed electrophysiological changes (including loss of silent synapses and increased AMPA/NMDA ratio after ESD of 6 hours) and altered gene expression that includes enrichment of SSC genes, many of which (7 candidates are listed) can affect both AMPA/NMDA ratio and silent synapses. No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point, even as to polarity of gene expression, related to electrophysiological outcome. Furthermore, some transcripts may involve receptor trafficking while others more directly affect activated receptor function. To help illustrate the complexity of interpreting gene up-regulation, consider the following hypothetical scenario. If a gene like upregulated Grin3a acts rapidly, it may facilitate reduction of NMDAR function (decreasing plasticity) during ESD, whereas upregulation of a gene like Kif17, if acting in a more delayed manner, might enhance NMDAR surface expression and activity (increasing silent synapses) in response to ESD, during recovery sleep. Relevant references, consistent with these various outcomes are supplied in the manuscript but further investigation is clearly needed, or as reviewer 2 so aptly commented, this work “…provides a framework to stimulate further research and advances on the molecular basis of sleep function”.  

      Several issues are raised by reviewer 1 concerning the electrophysiological methodology and statistical assessment. In regard to the former, we closely followed established protocols employed in the frontal neocortex (Myme et al., 2003). We did not include the details for series resistance monitoring. Series resistance values ranged between 8 and 15 MOhm and experiments with changes larger than 25% not used for further analyses. Thank you for bringing this  oversight on our part, to our attention. This essential information, that is unfailingly gathered for all our whole cell recordings, is now added to the version of record.

      The -90 mV holding potential was chosen according to precedent (Myme et al., 2003). It increases driving force and permits lower stimulus strength for the same response size – reducing the likelihood for polysynaptic responses. Experiments with multiple response peaks at -90 mV were not included in the analysis. The -90 mV holding potential also increases NMDA receptor Mg++ block resulting in a minimally contaminated AMPA response. This information is now added to our submitted version of record.

      The statistical assessments shown in Table 1 refer to two sets of data measured from 3X2=6 different cohorts for each sleep condition (CS, SD, RS): 1) AMPA & NMDA EPSCs and 2) AMPA/NMDA FR ratios (FRR; now bolded in row 1, second tab, Table S1). As stated in the results section, “A two-way ANOVA analysis showed a significant interaction between AMPA matched to NMDA EPSC response for each neuron, and sleep condition (F (2, 21) = 7.268, p<0.004; Figure 1 A, C, E). When considered independently, neither the effect of sleep condition nor of EPSC subtype reached significance at p<0.05 (Figure 1 C)”.  

      As noted by reviewer 1, we inadvertently dropped one of the data points from the RS FR and FR ratio (FRR) statistical analysis (raw data in the third tab of Table S1, statistical data in fourth and fifth tab and illustrated in figure 1 F). Thanks to this appreciated, rigorous review, we can correct the oversight (using raw data unchanged in Table S1, third tab). The Table S1 and figure 1 F are now corrected for the version of record. For better clarity, we now use two tabs, the fourth and fifth tabs, respectively of Table S1, for separate stat analyses of FR and FRR data.

      The significance of the AMPA/NMDA FRR across sleep conditions was assessed with the KruskalWallis test, a non-parametric method. The two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli (BKY) was used to control for the FDR across multiple sleep conditions, in the non-parametric Kruskal-Wallis test but it is usually less powerful than tests presuming normal distributions like the one-way ANOVA and Holm-Sidak’s test. We have now added re-analyzed  FRR across CS, SD and RS conditions using a normal one-way ANOVA (Table S1, tab5). The results now read, “The difference between  sleep conditions and FRR is significant (F (2, 19) = 11.3, Table S1, tab5). Multiple comparisons (Holm-Sidak, Table S1, tab5) indicate the near absence of silent synapses was reversed by either CS or RS (SD/CS; p<0.0011 and SD/RS: p<0.0006; Table S1, tab 5; Figure 1 F).”. These analyses compare well to the non-parametric assessment using the  KruskalWallis test (significant at p= 0.0006) with BYK correction for multiple comparison analysis to give for CS-SD, p<= 0.0262 and for RS-SD, p<= 0.0006 (statistics also shown in Table S1, tab5). [Also shown in tab5 is the “standard approach of correcting for family wise error rate”, namely, Dunn’s test. It is more conservative but less powerful than the BYK correction- in general the tradeoff of greater power/ less conservative is better tolerated when many comparisons are made, however, it can be argued that in the present analysis type 2 errors are also potentially misleading and thus not well tolerated.]  The modifications of our statistical analyses, inspired by reviewer 1,  did not affect the interpretation of the data nor the conclusions.  

      Bjorness TE, Kelly CL, Gao T, Poffenberger V, Greene RW (2009) Control and function of the homeostatic sleep response by adenosine A1 receptors. The Journal of neuroscience : the official journal of the Society for Neuroscience 29:1267-1276.

      Bjorness TE, Dale N, Mettlach G, Sonneborn A, Sahin B, Fienberg AA, Yanagisawa M, Bibb JA, Greene RW (2016) An Adenosine-Mediated Glial-Neuronal Circuit for

      Homeostatic Sleep. The Journal of neuroscience : the official journal of the Society for Neuroscience 36:3709-3721.

      Bjorness TE, Kulkarni A, Rybalchenko V, Suzuki A, Bridges C, Harrington AJ, Cowan CW, Takahashi JS, Konopka G, Greene RW (2020) An essential role for MEF2C in the cortical response to loss of sleep in mice. Elife 9.

      Kim SJ et al. (2022) Kinase signalling in excitatory neurons regulates sleep quantity and depth. Nature 612:512-518.

      Li B, Ma C, Huang YA, Ding X, Silverman D, Chen C, Darmohray D, Lu L, Liu S, Montaldo G, Urban A, Dan Y (2023) Circuit mechanism for suppression of frontal cortical ignition during NREM sleep. Cell 186:5739-5750 e5717.

      Liu ZW, Faraguna U, Cirelli C, Tononi G, Gao XB (2010) Direct evidence for wake-related increases and sleep-related decreases in synaptic strength in rodent cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience 30:8671-8675.

      Myme CI, Sugino K, Turrigiano GG, Nelson SB (2003) The NMDA-to-AMPA ratio at synapses onto layer 2/3 pyramidal neurons is conserved across prefrontal and visual cortices. Journal of neurophysiology 90:771-779.

      Porkka-Heiskanen T, Strecker RE, Thakkar M, Bjorkum AA, Greene RW, McCarley RW (1997) Adenosine: a mediator of the sleep-inducing effects of prolonged wakefulness. Science 276:1265-1268.

      Portas CM, Thakkar M, Rainnie DG, Greene RW, McCarley RW (1997) Role of adenosine in behavioral state modulation: a microdialysis study in the freely moving cat. Neuroscience 79:225-235.

      Rainnie DG, Grunze HC, McCarley RW, Greene RW (1994) Adenosine inhibition of mesopontine cholinergic neurons: implications for EEG arousal. Science 263:689692.

    1. Author response

      We appreciate the positive comments and constructive suggestions from the editors and reviewers, which will help us improve our manuscript. We will implement the changes as requested by the reviewers, focusing primarily on revising and clarifying the following aspects:

      First, we will clarify the use of biological and technical replicates in each experiment and provide more details about the statistical analyses conducted. Additionally, we plan to include a schematic representation of the experimental design.

      Second, we will explain the experiment conducted to rule out hormonal effects or differences in the oocyte maturation method used. We will also indicate the concentration of OVGP1 in the oviduct and explain why we selected OVGP1 as the probable cause of species specificity.

      Third, by addressing all of the reviewers' suggestions, we aim to resolve any concerns, inconsistencies, or minor errors identified by the reviewers.

      We are committed to addressing all the issues raised by the reviewers and believe that the manuscript will greatly benefit from the insightful suggestions and invaluable contributions of the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects.

      Strengths and Weaknesses:

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. A minor point is I cannot find how many DGRP lines are used.

      Thank you for the suggestions. We screened 193 lines and we will add that information to the methods. Additionally, we will add the heritability estimate of the post-diapause fecundity trait.

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Weaknesses:

      (1) I suspect that there may be variation in survivorship after long-term exposure to cold conditions (10ºC, 35 days), which could also be quantified and mapped using genome-wide association studies (GWAS). Since blocking Ir21a neuronal transmission prevented flies from exiting diapause, it is possible that natural genetic variation could have a similar effect, influencing the success rate of exiting diapause and post-diapause mortality. If there is variation in this trait, could it affect post-diapause fecundity? I am concerned that this could be a confounding factor in the analysis of post-diapause fecundity. However, I also believe that understanding phenotypic variation in this trait itself could be significant in regulating adult diapause.

      We agree that it is possible that the ability to endure cool temperatures per se may influence post-diapause fecundity. However, cool temperature is the essential diapause-inducing condition in Drosophila, so it is not obvious how to separate those effects experimentally, and we agree that phenotypic variation in the cool-sensitivity trait itself could be significant in regulating diapause.

      (2) On p.10, the authors conclude that "Dip-𝛾 and sbb are required in neurons for successful diapause, consistent with the enrichment of this gene class in the diapause GWAS." While I acknowledge that the results support their neuronal functions, I remain unconvinced that these genes are required for "successful diapause". According to the RNAi scheme (Figure 4I), Dip-γ and sbb are downregulated only during the post-diapause period, but still show a significant effect, comparable to that seen in the nSyb Gal4 RNAi lines (Figure 4K).

      Our definition of successful diapause is the ability to produce viable adult progeny post-diapause, which requires that the flies enter, maintain, and exit diapause, alive and fertile. We will restate our conclusion to say that Dip-γ and sbb are required for post-diapause fecundity.

      In addition, two other RNAi lines (SH330386, 80461) that did not show lethality did not affect post-diapause fecundity.

      We interpret those results to mean that those RNAi lines were not effective since Dip-γ and sbb are known to be essential.

      Notably, RNAi (27049, KK104056) substantially reduced non-diapause fecundity, suggesting impairment of these genes affects fecundity in general regardless of diapause experience. Therefore, the reduced post-diapause fecundity observed may be a result of this broader effect on fecundity, particularly in a more "sensitized" state during the post-diapause period, rather than a direct regulation of adult diapause by these genes.

      Ubiquitous expression of RNAi lines #27049 or #KK104056 was lethal, so we included the tubGAL80ts repressor to prevent RNAi from taking effect during development. Flies had to be shifted to 30 °C to inactivate the repressor and thereby activate the RNAi. At 30 °C, fecundity of the controls (GFP RNAi lines #9331, KK60102) were also lower (average non-diapause fecundity = 12 and 19 respectively) and similar to #27049 or #KK104056. We also assessed the knockdown using Repo GAL4 and nSyb GAL4 and did not find a significant difference/decline in the non diapause fecundity for #27049 and #KK104056 as compared to a nonspecific RNAi control (#54037).

      (3) The authors characterized 546 genetic variants and 291 genes associated with phenotypic variation across DGRP lines but did not prioritize them by significance. They did prioritize candidate genes with multiple associated variants (p.9 "Genes with multiple SNPs are good candidates for influencing diapause traits."), but this is not a valid argument, likely due to a misunderstanding of LD among variants in the same gene. A gene with one highly significantly associated variant may be more likely to be the causal gene in a QTL than a gene with many weakly associated variants in LD. I recommend taking significance into account in the analysis.

      We agree with the reviewer, and in Supplemental Table S3 we list top-associated SNPs in order from the lowest (most significant) p-value. Most of the top-associated genes from this analysis were uncharacterized CG numbers for which there were insufficient tools available for validation purposes. Nevertheless, there is overlap amongst the highly significant genes by p-value and those with multiple SNPs. Amongst the top 15 genes with multiple associated SNPs- CG18636 & CR15280 ranked 3rd by p-value, CG7759 ranked 4th, CG42732 ranked 10th, and Drip ranked 30th (all above the conservative Bonferroni threshold of 4.8e-8) while three Sbb-associated SNPs also appear in Table 3 above the standard e-5 threshold.

      Reviewer #3 (Public Review):

      Summary:

      Drosophila melanogaster of North America overwinters in a state of reproductive diapause. The authors aimed to measure 'successful' D. melanogaster reproductive diapause and reveal loci that impact this quantitative trait. In practice, the authors quantified the number of eggs produced by a female after she exited 35 days of diapause. The authors claim that genes involved with olfaction in part contribute to some of the variation in this trait.

      Strengths:

      The work used the power platform of the fly DRGP/GWAS. The work tried to verify some of the candidate loci with targeted gene manipulations.

      Weaknesses:

      Some context is needed. Previous work from 2001 established that D. melanogaster reproductive diapause in the laboratory suspends adult aging but reduces post-diapause fecundity. The work from 2001 showed the extent fecundity is reduced is proportional to diapause duration. As well, the 2001 data showed short diapause periods used in the current submission reduce fecundity only in the first days following diapause termination; after this time fecundity is greater in the post-diapause females than in the non-diapause controls.

      The 2001 paper by Tatar et al. reports the number of eggs laid after 3, 6, or 9 weeks in diapause conditions. Thus the diapause conditions used in this study (35 days or 5 weeks) are neither short nor long, rather intermediate. Does the reviewer have a specific concern?

      In this context, the submission fails to offer a meaningful concept for what constitutes 'successful diapause'. There is no biological rationale or relationship to the known patterns of post-diapause fecundity. The phenotype is biologically ambiguous.

      We have unambiguously defined successful diapause as the ability to produce viable adult progeny post-diapause. Other groups have measured % of flies that arrest ovarian development or % of post-diapause flies with mature eggs in the ovary, or # eggs laid post-diapause; however we suggest that # of viable adult progeny produced post-diapause is more meaningful than the other measurements from the point of view of perpetuating the species.

      I have a serious concern about the antenna-removal design. These flies were placed on cool/short days two weeks after surgery. Adults at this time will not enter diapause, which must be induced soon after eclosion. Two-week-old adults will respond to cool temperatures by 'slowing down', but they will continue to age on a time scale of day-degrees. This is why the control group shows age-dependent mortality, which would not be seen in truly diapaused adults. Loss of antennae increases the age-dependent mortality of these cold adults, but this result does not reflect an impact on diapause.

      We carried out the lifespan study under two different conditions. We either removed the antenna and moved the flies directly to 10 °C or we removed the antenna and allowed a “wound healing” period prior to moving the flies to 10 °C (out of concern that the flies might die quickly because wound healing may be impaired at 10 °C). In both cases, antenna removal shortened lifespan. Furthermore the lifespan extension at 10 °C was similar regardless of whether flies had experienced two weeks at 25 °C or not.

      • Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The work falls well short of its aim because the concept of 'successful diapause' is not biologically established. The paper studies post-diapause fecundity, and we don't know what that means. The loci identified in this analysis segregate for a minimally constructed phenotype. The results and conclusions are orthogonal.

      It is unclear to us why the reviewer has such a negative opinion of measuring post-diapause fecundity, specifically the ability to produce viable progeny post-diapause. The value of this measurement seems obvious from the point of view of perpetuating the species.

      • The likely impact of the work on the field, and the utility of the methods and data to the community.

      The work will have little likely impact. Its phenotype and operational methods are weakly developed. It lacks insight based on the primary literature on post-diapause. The community of insect diapause investigators are not likely to use the data or conclusions to understand beneficial or pest insects, or the impact of a changing climate on how they over-winter.

      The reviewer has not explained why his/her opinion is so negative.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Perform an ANOVA to estimate heritability.

      We will do this.

      (2) List the number of DGRP lines tested.

      193

      Reviewer #2 (Recommendations For The Authors):

      [Minor suggestions]

      (1) Check Drosophila italics

      We will do this.

      (2) It would be informative to include the number of DGRP lines used in this study in the Results and Methods section.

      We will include the information that we assessed 193 DGRP lines.

      (3) Figure 1C - several dots are missing at the top of the line.

      We will correct.

      (4) Figures 1E, F - Why use a discontinuous histogram for continuous distribution? Consider using a continuous histogram (e.g. Lafuente et al. (2018) Figure 1C).

      We will do this.

      (5) Figure 1F - Why have fewer bins than panel E?

      Figure 1F is normalized post-diapause fecundity. Individual post-diapause fecundity was normalized to the mean non-diapause fecundity. Then the normalized individual post-diapause fecundity was averaged to get the mean normalized post-diapause fecundity for the DGRP line. So the bins are different in panel E. Please refer to Supplemental Table S1.

      (6) Figure 2D - It would be informative to have fold enrichment stats.

      The following will be added in the methods section: The Gene Ontology (GO) categories and Q-values from the false discovery rate (FDR)-corrected hypergeometric test for enrichment are reported. Additionally, coverage ratios for the number of annotated genes in the displayed network versus the number of genes with that annotation in the genome are provided. GeneMANIA estimates Q-values using the Benjamini-Hochberg procedure.

      (7) Supplementary table (Table S5) or supplemental table (other supplementary tables)? Need consistency (to Supplementary?)

      We will change ‘Supplementary Table S5’ to ‘Supplemental Table S5’.

      (8) Figure 5D,E - unused ticks on the x-axis.

      The unused ticks on the x-axis will be removed from Figures 5D and E.

      Reviewer #3 (Recommendations For The Authors):

      • Suggestions for improved or additional experiments, data or analyses.

      The authors cannot redo the GWAS with an alternative trait that might better reflect 'successful diapause', and I am not even sure what such a trait would involve or mean. Given this limitation, the authors should consider how they can conduct additional experiments to better define, justify, and elaborate how post-diapause reproduction relates to the mechanisms, processes, depth, and 'success' of diapause.

      We agree that it is entirely unclear what trait would be a better measure of successful diapause. Other investigators might have chosen to measure something different but there is no reason why a different choice would be a better choice. We do not believe that this is a “limitation.” We believe that we have unambiguously defined and justified  post-diapause reproduction as a measurement of successful diapause with respect to perpetuating the species through a stressful period.

      • Recommendations for improving the writing and presentation.

      The mechanics of the writing are fine, aside from some typos/grammar issues. But, the paper is conceptually superficial and tautological. It claims to provide a 'stringent criterion' for 'successful diapause', then measures an unjustified trait, then claims this demonstrates variation for 'successful diapause'.

      We respectfully disagree with this opinion.

      This story is conducted without reference to prior, primary literature or on the mechanisms of reproductive diapause. The presentation may be improved by considering the literature and precedence for what and how reproductive diapause is induced, maintained, and terminated ... in many insects as well as Drosophila

      We will revisit our citations of the literature and apologize for any inadvertent omissions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In our initial submission, reviewers highlighted that the major limitations of our study were related to both the number of minibinders tested as well as the number of optimizations we evaluated for improving minibinder function. In this revision, we have focused on expanding the minibinders tested. To do so, we selected two previously published minibinders against the epidermal growth factor receptor (EGFR). Selection of EGFR as a target enabled us to evaluate two minibinders that bind at different sites, unlike the previously evaluated binders LCB1 and LCB3 which both bind the same interface on SARS-CoV-2 Spike. Further, using EGFR as a target enabled us to qualitatively compare the efficacy of minibinder-coupled chimeric antigen receptors against an existing anti-EGFR CAR. We believe the results here demonstrate broader generalizability of our approach across binding sites, targets, and minibinders. We hope this addition is sufficient to convince future would-be users of these tools to attempt synthetic receptor engineering using minibinders against their protein of choice.

      Reviewers made comments about the presentation of flow data and the use of statistics throughout the manuscript. We did not modify how flow data are presented as the density plots we used are common throughout the field. We have opted to not include statistics – we believe that in the case of most of the experiments we show, our findings are obvious. In cases where statistics would be helpful for discerning whether subtle effects are real – for example, comparing the linker-based optimizations or comparing the anti-EGFR CARs – we believe that other experimental factors like construct expression are sufficient confounds that even in the presence of statistically significant effects we would be leading readers astray to make such claims about our data. As such, we have sought to limit the claims we make and hope that reviewers and audience agree we do not over interpret our data without statistical support.

      On more minor points, both reviewers addressed the differences in Figure 5A and 5C, which we addressed in our figure legend and in the previous response to reviews is the result of these data originating from different time points of the same assay. Reviewer #2 believed we should be more staid in our comments about linker optimality, which we have addressed by changing the referenced line in the discussion. Otherwise, we have made no modifications to figures or text beyond the addition of new data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We addressed the issue of “tolerability” in our answers to Reviewer 2 and in the revised manuscript where we had added data concerning tolerability, see the paragraph in the Results Section, page 11:

      "Finally, tolerability studies were performed with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We have slightly modified the paragraph above to emphasize that the tolerability studies were performed in “naïve mice”. 

      "Finally, tolerability studies were performed in naïve mice with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We propose to add a sentence in the Results section, page 11, relative to the fact that we can also induce severe hypothermia in rats using conjugates similar to VH-N412.

      We also added in the Discussion section (page 38) that we could induce hypothermia with different conjugates in mice, rats and pigs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Some of the figures are of rather poor quality. For example, the H&E and Sirius Red stainings in Figures 3 and 4 are quite poor so it is difficult to see what is going on in the muscles. The authors should take note of another publication on dy3K/dy3K mice of similar age (PMID: 31586140) where such images are of much higher quality. Similarly, the Western blot for laminin-alpha2 (Figure 4B) of the wild-type mouse needs improvement. If the single laminin-alpha2 protein is not detected, there is an issue with the denaturation buffer used to load the protein.

      Thank you for the valuable suggestions. We have read the study on dy3K/dy3K mice of similar age (PMID: 31586140) which showed dystrophic changes in dy3K/dy3K muscle throughout the disease course with the whole muscle and representative muscle area. We have generated new figures with higher quality including the whole muscle and representative muscle area for the H&E and Sirius Red stainings.  However, due to the large images, we have added them in the new Figure supplement 2 and Figure supplement 3. Also, we have changed the denaturation buffer used to load the protein, and performed Western blot of laminin α2, the result of the laminin α2 protein of the wild-type mice (n =3) and dyH/dyH mice (n =3) detected by Western blot has been showed in Figure 4B.

      (2) My biggest concern is, however, the many overstatements in the manuscript and the over-interpretation of the data. This already starts with the first sentence in the abstract where the authors write: "Understanding the underlying pathogenesis of LAMA2- related muscular dystrophy (LAMA2-MD) have been hampered by lack of genuine mouse model." This is not correct as the dy3K/dy3K, generated in 1997 (PMID: 9326364), are also Lama2 knockout mice; there are also other strains (dyW/dyW mice) that are severely affected and there are the dy2J/dy2J mice that represent a milder form of LAMA2-MD. Similarly, the last two sentences of the abstract "This is the first reported genuine model simulating human LAMA2-MD. We can use it to study the molecular pathogenesis and develop effective therapies." are a clear overstatement. The mechanisms of the disease are well studied and the above-listed mouse models have been amply used to develop possible treatment options. The overinterpretation concerns the results from transcriptomics. The fact that Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier as the authors state. If there are no functional data, this cannot be stated. Indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494) and this needs to be written like this.

      Thank you for your comment and sorry for the overstatements in the manuscript. We have carefully considered our previous statements and corrected them accordingly. We have changed the first sentence in the abstract into "Our understanding of the molecular pathogenesis of LAMA2-related muscular dystrophy (LAMA2-MD) requires improving". Also, we have changed the last two sentences in the abstract with "In summary, this study provided useful information for understanding the molecular pathogenesis of LAMA2-MD".

      We also agree that "Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier", and the indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494). Therefore, we have corrected the overstatement according to the suggestion with "It was reported that the deficiency of laminin α2 in astrocytes and pericytes was associated with a defective blood-brain barrier (BBB) in the dy3K/dy3K mice (Menezes et al., 2014). The defective BBB presented with altered integrity and composition of the endothelial basal lamina, reduced pericyte coverage, and hypertrophic astrocytic endfeet lacking appropriately polarized aquaporin4 channels."

      (3) Finally, the bulk RNA-seq data also needs to be presented in a disease context. The authors, again, mix up changes in expression with functional impairment. All gene expression changes are interpreted as direct evidence of an involvement of the cytoskeleton. In fact, changes in the cytoskeleton are more likely a consequence of the severe muscle phenotype and the delay in muscle development. This is particularly possible as muscle samples from 14-day-old mice are compared; a stage at which muscle still develops and grows tremendously. Thus, all the data need to be interpreted with caution.

      Thank you for your comment. We have changed the over-interpretation of the bulk RNA-seq data, and have corrected the last sentence in the Result with "These observations important data for the impaired muscle cytoskeleton and abnormal muscle development which were associated with the muscle pathology consequence of severe dystrophic changes in the dyH/dyH mice.".

      (4) In summary, the authors need to improve data presentation and, most importantly, they need to tone down the interpretation and they must be fully aware that their work is not as novel as they present it.

      Thank you for your comments and valuable suggestions, and we have changed the previous overstatements and interpretation of the results. We are sorry that we failed to clearly present our rational of making this mouse model. Indeed, there were many existing mouse models, which were all important to the research in the field. One of the reasons why we wished to create dyH/dyH is to make a mouse model without any trace of engineering (e.g., inserted bacterial elements for knockout). By doing so, we were hoping to provide a novel model suited for gene-editing-based gene therapy development. To this end, dyH/dyH was created to reflect the hot mutation region in the Chinese population. Hopefully, you will agree with our points and see that we were not trying to belittle previous models but were simply trying to provide a different option. The overstatements were largely rooted from language barriers, and we have tried to make our statements more cautious and acceptable to the readers.

      Reviewer #2 (Public Review):

      (1) The major weakness is the manuscript reads like this was the first-ever knockout mouse model generated for LAMA2-CMD. There are in fact many Lama2 knockout mice (dy, dy2J, dy3k, dyW, and more) which have all been extensively studied with publications. It is important for the authors to comment on these other published studies that have generated these well-studied mouse lines. Therefore, there is a lack of background information on these other Lama2 null mice.

      Thank you for your comment. We have added background information on these other Lama2 null mice with the sentences "The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995). Among them, the dy/dy, dy3k/dy3k, dyw/dyw mice present severe muscular dystrophy, and dy2J/dy2J mice show mild muscular dystrophy and peripheral neuropathy (Gawlik and Durbeej, 2020). The mutation of the dy/dy mice has been still unclear (Xu et al., 1994; Michelson et al., 1995). The dy3k/dy3k mice were generated by inserting a reverse Neo element in the 3' end of exon 4 of Lama2 gene in 1997 (Miyagoe et al., 1997), and the dyw/dyw mice were created with an insertion of lacZ-neo in the exon 1 of Lama2 gene in 1998 (Kuang et al., 1998). The dy2J/dy2J mice were generated in 1970 by a spontaneous splice donor site mutation which resulted in a predominant transcript with a 171 base in-frame deletion, leading to the expression of a truncated laminin α2 with a 57 amino acid deletion (residues 34-90) and a substitution of Gln91Glu (Sunada et al., 1995). They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies. Moreover, insufficient transcriptomic data of the muscle and brain of LAMA2-CMD mouse models limits the understanding of disease hallmarks. Therefore, there is a need to create new appropriate mouse models for LAMA2-CMD based on human high frequently mutated region using the latest gene editing technology such as clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9."

      (2) The phenotypes of dyH/dyH are similar to, if not identical to dy/dy, dy2J/dy2J, dy3k/dy3k, dyW/dyW including muscle wasting, muscle weakness, compromised blood-brain barrier, and reduced life expectancy. This should be addressed, and a comparison made with Lama2 deficient mice in published literature.

      Thank you for your comment. We have added Table supplement 3 to make a comparison between dyH/dyH with other Lama2 deficient mice. We aslo have added the statement in Discussin with "Compared with other Lama2 deficient mice including dy/dy, dy2J/dy2J, dy3k/dy3k and dyW/dyW, the phenotype of the dyH/dyH mice presented with a very severe muscular dystrophy, which was similar to that of the dy3k/dy3k mice (Table supplement 3)."

      (3) Recent published studies (Chen et al., Development (2023), PMID 36960827) show loss of Itga7 causes disruption of the brain-vascular basal lamina leading to defects in the blood-brain barrier. This should be referenced in the manuscript since this integrin is a major Laminin-211/221 receptor in the brain and the mouse model appears to phenocopy the dyH/dyH mouse model.

      Thank you for your great suggestion. We have cited the published studies (Chen et al., Development (2023), PMID 36960827) and added statements in Discussion with "As reported, the aberrant BBB function was also associated with the adhesion defect of alpha7 integrin subunit in astrocytes to laminins in the Itga_7-/- mice (_Chen et al., 2023). In this study, loss of communications involving the laminins’ pathway between laminin α2 and integrins were predicted between vascular and leptomeningeal fibroblasts and astrocytes in the dyH/dyH brain, providing more evidence for the impaired BBB due to laminin α2 deficiency."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Improve the data presentation (as mentioned above). Make a new picture of the histology; repeat the Western blots. Discuss the RNA-seq data with more caution and present it in a more attractive way. Tone down the wording.

      Thank you for your recommendations. We have revised the overstatements and improved the RNA-seq data interpretation as suggested. Also,we have made a new picture of the histology, and repeated the Western blots.

      Reviewer #2 (Recommendations For The Authors)

      (1) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

      (2) Figure 2: The animal numbers used in this analysis were not indicated. Please include this number in the figure legend.

      Thank you for your recommendations. We have added animal numbers in the figure legends wherever applicable.

      (3) Figure 2: The forelimb grip strength is informative but has limitations. Ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength.

      Thank you for your recommendations. We do agree that the ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength, and we really want to finish this experiment. However, we feel sorry that this test has not been finished due to the following reasons: (1) The forelimb grip strength for measuring muscle strength is a classic method and remains a commonly used method for measuring mouse muscle strength in the studies of different muscular dystrophies, such as LAMA2-MD (Amelioration of muscle and nerve pathology of Lama2-related dystrophy by AAV9-laminin-αLN linker protein. JCI Insight. 2022;7(13):e158397. PMID: 35639486), Duchenne muscular dystrophy (Investigating the role of dystrophin isoform deficiency in motor function in Duchenne muscular dystrophy. J Cachexia Sarcopenia Muscle. 2022;13(2):1360-1372. PMID: 35083887), facioscapulohumeral muscular dystrophy (Systemic delivery of a DUX4-targeting antisense oligonucleotide to treat facioscapulohumeral muscular dystrophy. Mol Ther Nucleic Acids. 2021;26:813-827. PMID: 34729250), and etc. (2) The forelimb grip strength for measuring muscle strength is also used in the human studies (PMID: 32366821; PMID: 29313844; PMID: 34499663, and etc). In view of reasons above, for measuring muscle strength, we used the forelimb grip strength, and have not finished the supplementary experiment of ex vivo or in vivo muscle contractility.

      (4) Figure 3: Muscle fibrosis should be measured with a hydroxyproline assay.

      Thank you for your recommendations. We do agree that the hydroxyproline assay is one of the most classic method to evaluate collagen content for measuring muscle fibrosis. However, we performed Sirius Red staining for measuring muscle fibrosis due to the following reasons: (1) Muscle fibrosis measured by Sirius Red staining can be observed more directly, and the other pathological features also can be observed, and compared through muscle pathology. (2) Sirius Red staining is also a classic method and remains a commonly used method for measuring muscle fibrosis, which has been previously reported in the mouse studies of muscle disorders, such as PMID: 22522482 (Losartan, a therapeutic candidate in congenital muscular dystrophy: studies in the dy(2J) /dy(2J) mouse. Ann Neurol. 2012;71(5):699-708.), PMID: 34337906 (Aging-related hyperphosphatemia impairs myogenic differentiation and enhances fibrosis in skeletal muscle. J Cachexia Sarcopenia Muscle. 2021;12(5):1266-1279.), PMID: 28798156 (Phosphodiesterase 4 inhibitor and phosphodiesterase 5 inhibitor combination therapy has antifibrotic and anti-inflammatory effects in mdx mice with Duchenne muscular dystrophy. FASEB J. 2017;31(12):5307-5320.), and etc. Therefore, we used Sirius Red staining to measure muscle fibrosis in this study.

      (5) Figure 8: The N=3 is very low which could result in type I or II statistical errors. A larger sample size will reduce the chance of statistical errors.

      Thank you for your recommendations. We have increased the number of animals to reduce the chance of statistical errors. We have performed the supplementary experiment, the number of animals for each group has been increased to 6 (3 male and female each).  The results were consistent with previous data in Figure 8.

      (6) Power analysis to estimate experimental animal numbers should be reported in the manuscript.

      Thank you for your recommendations. Refer to previous study (Power and sample size. Nature Methods. 2013;10:1139–1140), “The distributions show effect sizes d = 1, 1.5 and 2 for n = 3 and α = 0.05. Right, power as function of d at four different a values for n = 3”, and “If we average seven measurements (n = 7), we are able to detect a 10% increase in expression levels (μ_A = 11, _d = 1) 84% of the time with α = 0.05.”, the experimental animal numbers estimated were 3 to 7. Moreover, if the increased number of experimental animals could be available, we would retain data.

      (7) It is unclear if the studies were performed with adequate rigor. Were those scoring outcome measures blinded to the treatment groups?

      Thank you for your recommendations. We performed the studies with those scoring outcome measures not blinded to the treatment groups, the groups were based on their genotype. Actually, it was easy to discriminate the dyH/dyH groups from the WT/Het mice due to their small body shape.

      (8) Authors should appropriately cite previous studies that have generated Lama2 null mice.

      Thank you for your recommendations. We have cited previous studies that have generated Lama2 null mice with the sentence “The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995)”.

      (9) The number of animals should be increased to reduce the chance of statistical error.

      Thank you for your recommendations. We have performed the supplementary experiment, the number of animals for each group has been increased to reduce the chance of statistical error.

      (10) A power analysis should be performed to determine the number of experimental animals.

      Thank you for your recommendations. We have performed a power analysis to determine the number of experimental animals as mentioned above.

      (11) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) General comment: The evidence for these highly novel, potentially interesting roles (of the exocyst) would need to be more compelling to support direct involvement.

      We wish to thank the reviewer for his/her comments, and for considering that the proposed functions are highly novel and potentially interesting. To strengthen the evidence supporting the new roles of the exocyst, we have performed a number of additional experiments that are depicted in novel figures or figure panels of the new version of the manuscript. Particularly, we aimed at providing further support of the direct involvement of the exocyst in different steps of the regulated secretory pathway. Please see the details below.

      (2) For instance, the localization of exocyst to Golgi or to granule-granule contact sites does not seem substantial.

      We have performed quantitative colocalization studies, as suggested by the reviewer to further substantiate our initial findings. We have carefully analysed GFP-Sec15 distribution in relation to the Golgi complex and secretory Glue granules at relevant time points of salivary gland development. Overall, we found that GFP-Sec15 distribution is dynamic during salivary gland development. Before Glue synthesis (72 h AEL), Sec15 was observed in close association (defined as a distance equal to, or less than 0.6 µm) with the Golgi complex (please see below Author response image 1). This association was lost once Glue granules have begun to form (96 h AEL). Importantly, we do not see relevant association between GFP-Sec15 and the ER (please see Author response image 2). These observations support our conclusion that the exocyst plays a role at the Golgi complex. New images supporting these conclusions, as well as quantitative data, have been included in Figure 5 of the new version of the manuscript. In addition, real time imaging, as well as 3D reconstruction analyses, confirming the close association between Sec15 and Golgi cisternae are now included in the manuscript. Please see Supplementary Videos 1-3. These new data are described in the text lines 200-210 of the Results section and text lines 359368 of the Discussion section.

      Interestingly, at the time when Sec15-Golgi association is lost (96 h AEL), Sec15 foci associate instead with newly formed secretory granules (< 1µm diameter). This association persists during secretory granule maturation (100-116 h AEL), when Sec15 foci localize specifically in between neighbouring, immature secretory granules. When maturation has ended and Glue granule exocytosis begins (116-120 h AEL), this localization between granules is lost. These observations are consistent with a role of the exocyst in homotypic fusion during SG maturation. We have included new images showing that association between Sec15 and secretory granules is dynamic and depends on the developmental stage. We have quantified this association both during maturation and at a stage when SGs are already mature. We have in addition performed a 3D reconstruction analysis of these images to confirm the close association between Sec15 and immature SGs. These new data are now depicted in Figure 7BC, Supplementary Videos 4-5, and described in text lines 216-221 of the Results section. In addition, a lower magnification image is provided below in this letter (Author response image 3), quantifying the proportion of Sec15 foci localized in between SGs (yellow arrows) relative to the total number of Sec15 foci (yellow arrows + green arrowheads).

      Author response image 1.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe trans-Golgi network in the experiments of Figure 5C-E of the manuscript.When the distance between maximal intensities of GFP-Sec15 and Golgi-RFP signals was equal or less than 0.6 m, the signals were considered “associated” (upper panels). When the distance was more than 0.6 m, the signals were considered “not associated” (lower panels).

      Author response image 2.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe ERin the experiments of Figure 5A-Bof the manuscript.When the distance between maximal intensities of GFP-Sec15 and KDEL-RFP signals was equal or less than 0.6 m, the signals were considered “associated”. When the distance was more than 0.6 m, the signals were considered “not associated”.

      Author response image 3.

      (A) GFP-Sec15 foci (cyan) and SGs (red) are shown in cells bearing Immature SGs or (B) with mature SGs. Yellow arrows indicate GFP-Sec15 foci localized in between SGs; green arrowheads indicate GFP-Sec15 foci that arenot in between SGs. (C) Quantification of the percentage (%) of Sec15 foci localized in between SGs respect to the total number of Sec15 foci in cells filled with immature SGs (ISG)vs cells with mature SGs (MSG).

      It is interesting to mention that previous evidence from mammalian cultured cells (Yeaman et al,  2001) show that the exocyst localizes both at the trans-Golgi network and at the plasma membrane, weighing in favour of our claim that the exocyst is required at various steps of the exocytic pathway. Thus, the exocyst may play multiple roles in the secretion pathway in other biological models as well. This concept has now been included at the Discussion section of the revised version of the manuscript (lines 359-368).

      To make the conclusions of our work clearer, in the revised version of the manuscript, we have now included a graphical abstract, summarizing the dynamic localization of the exocyst in relation to the processes of SG biogenesis, maturation and exocytosis reported in our work. 

      (3) Instead, it is possible that defects in Golgi traffic and granule homotypic fusion are not due to direct involvement of the exocyst in these processes, but secondary to a defect in canonical exocyst roles at the plasma membrane. A block in the last step of glue exocytosis could perhaps propagate backward in the secretory pathway to disrupt Golgi complexes or cause poor cellular health due to loss of cell polarity or autophagy.

      We thank the reviewer for these thoughtful comments. We have performed a number of additional experiments to assess “cellular health” or to identify possible defects in cell polarity after knock-down of exocyst subunits. These new data have been included in new supplementary figures 5 and 6 of the revised version of the manuscript (please see below). 

      In our view, the precise localization of GFP-Sec15 at the Golgi complex (Figure 5C-E), as well as in between immature secretory granules (Figure 7B-D), argues in favour of a direct involvement of the exocyst in SG biogenesis and homofusion respectively. 

      We truly appreciate the comment of the reviewer raising the possibility that the defects that we observe at early steps of the pathway (SG biogenesis and SG maturation) may actually stem from a backward effect of the role of the exocyst in SG-plasma membrane tethering. We wish to respectfully point out that the processes of biogenesis, maturation and plasma membrane tethering/fusion of SGs do not occur simultaneously in the Drosophila larval salivary gland in vivo, as they do in other secretory model systems (i.e. cell culture). In this regard, the experimental model is unique in terms of synchronization. In each cell of the salivary gland, the three processes (biogenesis, maturation and exocytosis) occur sequentially, and controlled by developmental cues. At the developmental stage when SGs fuse with the plasma membrane, SG biogenesis has already ceased many hours earlier: SG biogenesis occurs at 96-100 hours after egg lay (AEL), SG maturation takes place at 100-112 hours AEL, and SG-plasma membrane fusion happens only when all SGs have undergone maturation and are ready to fuse with the plasma membrane at 116-120 h AEL. Thus, in our view it is not conceivable that a defect in SG-plasma membrane tethering/fusion (116-120 h AEL) may affect backwards the processes of SG biogenesis or SG maturation, which have occurred earlier in development (96-112 h AEL).

      As suggested by the reviewer, we have analysed several markers of cellular health and cell polarity, comparing conditions of exocyst subunit silencing (exo70RNAi, sec3RNAi or exo84RNAi) with wild type controls (whiteRNAi). These new data are depicted in Supplementary Figures 5 and 6, and described in lines 172-179 of the Results section of the revised version of the manuscript. Noteworthy, for these experiments we have applied silencing conditions that block secretory granule maturation, bringing about mostly immature SGs. Our analyses included: 1) Subcellular distribution of PI(4,5)P2, 2) subcellular distribution of the tetraspanin CD63, 3) of Rab11, 4) of filamentous actin, and 5) of CD8. We have also compared 6) nuclear size and nuclear general morphology, 7) the number and distribution of mitochondria, 8) morphology and subcellular distribution of the cis- and 9) trans-Golgi networks. Finally, 10) we have compared basal autophagy in salivary cells with or without knocking down exocyst subunits. The markers that we have analysed behaved similarly to those of control salivary glands, suggesting that the observed defects in regulated exocytosis indeed reflect different roles of the exocyst in the secretory pathway, rather than poor cellular health or impaired cell polarity.  

      Our conclusions are in line with previous studies in which apico-basal polarity, Golgi complex morphology and distribution, as well as apical membrane trafficking were also evaluated in exocyst mutant backgrounds, finding no anomalies (Jafar-Nejad et al, 2005). 

      Conversely, in studies in which apical polarity was disturbed by interfering with Crumbs levels, SG biogenesis, maturation and exocytosis were not affected (Lattner et al, 2019), indicating that these processes not necessarily interfere with one another.  

      (4) Final recommendation: In the absence of stronger evidence for these other exocyst roles, I would suggest focusing the study on the canonical role (interesting, as it was previously reported that Drosophila exocyst had no function in the salivary gland and limited function elsewhere [DOI: 10.1034/j.1600-0854.2002.31206.x]), and leave the alternative roles for discussion and deeper study in the future.  

      We appreciate the reviewer´s recommendation. However, we believe that the major strength of our work is the discovery of non-canonical roles of the exocyst complex, unrelated to its function as a tethering complex for vesicle-plasma membrane fusion. We believe that in the new version of our manuscript, we provide stronger evidence supporting the two novel roles of the exocyst:

      a) Its participation in maintaining the normal structure of the Golgi complex, and b) Its function in secretory granule maturation.

      Reviewer 2:

      (5) General comment: A key strength is the breadth of the assays and study of all 8 exocyst subunits in a powerful model system (fly larvae). Many of the assays are quantitated and roles of the exocyst in early phases of granule biogenesis have not been ascribed. 

      We are grateful that the reviewer appreciates the novelty of our contribution.

      (6) However there are several weaknesses, both in terms of experimental controls, concrete statements about the granules (better resolution), and making a clear conceptual framework. Namely, why do KD of different exocysts have different effects on presumed granule formation

      The reviewer has raised a point that is central to the interpretation of all our data throughout the manuscript. The short answer is that the extent of RNAi-dependent silencing of exocyst subunits determines the phenotype: 

      1) Maximum silencing affects Golgi complex morphology and prevents SG biogenesis. 2) Intermediate silencing blocks SG maturation, without affecting Golgi complex morphology and SG biogenesis. 3) Weak silencing blocks SG tethering and fusion with the plasma membrane, without affecting Golgi complex morphology, SG biogenesis or SG maturation. 

      In other words, 1) Low levels of exocyst subunits are sufficient for normal Golgi complex morphology and SG biogenesis. 2) Intermediate levels of exocyst subunits are sufficient for SG maturation (and also sufficient for SG biogenesis). 3) High levels of exocyst subunits are required for SG tethering and subsequent fusion with the plasma membrane. 

      Based on the above notion, we have exploited the fact that temperature can fine-tune the level of Gal4/UAS-dependent transcription, thereby achieving different levels of silencing, as shown by Norbert Perrimon et al in their seminal paper “the level of RNAi knockdown can also be altered by using Gal4 lines of various strengths, rearing flies at different temperatures, or via coexpression of UAS-Dicer2” (Perkins et al, 2015). 

      We found in our system that indeed, by applying appropriate silencing conditions (RNAi line and temperature) to any of the eight subunits of the exocyst, we have been able to obtain one of the three alternative phenotypes: Impaired SG biogenesis, or impaired SG maturation, or impaired SG tethering/fusion with the plasma membrane.

      These concepts are summarized below in Author response image 4. Please see also at point 26, the general comment of Reviewer #3. 

      We have conducted qRT-PCR assays to provide experimental support to the notions summarized above in Author response image 4. We measured the remaining levels of mRNAs of some of the exocyst subunits, after inducing RNAi-mediated silencing at different temperatures, or with different RNAi transgenic lines. The remaining RNA levels after silencing correlate well with the observed phenotypes, following the predictions of Author response image 4 and summarized in Author response image 5. These new data are now shown in Supplementary Figure 2 of the revised version of the manuscript, and described in lines 153-159 at the Results section.

      (7) Why does just overexpression of a single subunit (Sec15) induce granule fusion?

      The reviewer raises a very important point. Based on available data from the literature, Sec15 behaves as a seed for assembly of the holocomplex and it also mediates the recruitment of the holocomplex to SGs through its interaction with Rab11 (Escrevente et al, 2021; Bhuin and Roy, 2019; Wu et al, 2005; Zhang et al, 2004; Guo et al, 1999). Thus, overexpression of Sec15 is expected to enhance exocyst assembly, thereby potentiating the activities carried out by the complex in the cell, including SG homofusion. In the revised version of the manuscript we have also performed the overexpression of Sec8, finding that, unlike Sec15, Sec8 fails to induce homotypic fusion. These results were expected, as they confirm that Sec8 does not behave as a seed for mounting the whole complex. These new data have been included in Figure 7E-H, and are described in text lines 221-229 of the Results section. 

      Author response image 4.

      Conceptual model of RNAi expression at different temperatures , remaining levels of mRNA/protein levels and phenotypes obtained at each temperature.

      Author response image 5.

      qRT-PCR assays presented in Supplementary Figure 2 are shown in combination with the phenotypes observed at each of the conditions analyzed. Note the correlation between phenotypes and the extent of mRNA downregulation.

      (8) While the paper is fascinating, the major comments need to be addressed to really be able to make better sense of this work, which at present is hard to disentangle direct vs. secondary effects, especially as much of the TGN seems to be altered in the KDs.  

      We hope that our response to point 6) has helped to clarify this important point raised by the Reviewer. After applying silencing conditions where normal structure of the trans-Golgi network is impaired, SG biogenesis does not occur. Thus, since SGs do not form, it is not conceivable to detect defects in SG maturation or SG fusion with the plasma membrane in the same cell.

      (9) The authors conveniently ascribe many of the results to the holocomplex, but their own data (Fig. 4 and Fig. 6) are at odds with this.

      This is another central point of our work, so we thank the reviewer for his/her comment. In Figures 4A, 7A and 9A of the revised version of the manuscript, we show that, by inducing appropriate levels of silencing of any of the 8 subunits of the exocyst, each of the three alternative phenotypic manifestations can occur. In our opinion, this argues in favour of a function for the whole exocyst complex in each of the three specific activities proposed in our study: 1) SG biogenesis, 2) SG maturation, and 3) SG tethering/fusion with the plasma membrane. In detailed characterizations of these three phenotypes performed throughout the study, we decided to induce silencing of just two or three of the subunits of the exocyst, assuming that the whole complex accounts the mechanisms involved.

      Major comments

      (10) Resolution not sufficient. Identification of "mature secretory granules" (MSG) in Fig. 3 is based on low-resolution images in which the MSG are not clearly seen (see control in Fig. 3A) and rather appear as a diffuse haze, and not as clear granules. There may be granules here, but as shown it is not clear. Thus it would be helpful to acquire images at higher resolution (at the diffraction limit, or higher) to see and count the MSG.

      We thank the reviewer for raising this point, as it may not be straightforward to the reader to identify the SGs throughout the figures of our study. To make it clearer, in Figure 3A (magnified insets on the right), we have delimitated individual SGs with a green dotted line, and included diagrams (far right), which we hope will help the identification of SGs. In Figure 3B, we show that after silencing Sec84, a mosaic phenotype was observed: In some cells SGs fail to undergo maturation, and remain smaller than normal. In other cells of this mosaic phenotype, biogenesis of SGs was impaired and the fluorescent cargo remained trapped in a mesh-like structure (that we later show that corresponds to the ER). The dotted line marks individual SGs, and the diagrams included on the right intend to help the interpretation of the phenotype. The mesh-like structures where Sgs3-GFP was retained are also marked with dotted line, and schematized on the right. These new schemes are described in the Figure 3 caption of the revised version of the manuscript.

      We wish to mention that all the confocal images depicted in this figure and throughout the manuscript  have been captured at high resolution, with a theoretical resolution limit of 168177nm (d = γ/2NA). Given that secretory granules range from 0.8-7µm in diameter, the resolution is more than sufficient to clearly resolve these structures. 

      (11) Note: the authors are not clear on which objective was used. Maybe the air objective as the resolution appears poor).  

      In this particular figure, we have utilized a Plan-Apochromat 63X/1.4NA oil objective of the inverted Carl Zeiss LSM 880 confocal microscope (mentioned in materials and methods).

      (12) They need to prove that the diffuse Sgs3-GFP haze is indeed due to MSG.  

      If we interpret correctly the concern of the reviewer, what he/she calls “diffuse haze” is actually the distribution of Sgs3-GFP within individual SGs, which, as previously reported by other authors, is not homogeneous at this stage (Syed et al. 2022). We hope that the diagrams that we have included in Figure 3 A, B (point 10) will help the readers interpreting the images.   

      (13) Related it is unclear what are the granule structures that correspond to Immature secretory granules (ISG) and cells with mesh-like structures (MLS)?

      We are confident that the diagrams now included in Figure 3A and B will help the interpretation, and particularly to identify immature granules and the mesh-like structure generated after silencing of exocyst subunits.

      (14) Similarly, Sgs3 images of KD of 8 exocyst subunits were interpreted to be identical, in Fig. 4, but the resolution is poor.

      We hope that the issue related to resolution of our images has been properly addressed in the response to point 10) of this letter. In Figure 4A, we show that after silencing of any of the 8 subunits (with the appropriate conditions), in all cases SG biogenesis was impaired, and Sgs3GFP was instead retained in a mesh-like structure. Images obtained after silencing different exocyst subunits are of course not identical, but in all cases, a mesh-like structure has replaced the formation of SGs (Figure 4A). Hopefully, the diagrams now included in Figure 3A and B help the correct interpretation of the phenotypes throughout the study.

      To demonstrate that the structure in which Sgs3-GFP was retained upon exocyst complex knockdown corresponds to the ER, we performed a colocalization analysis between Sgs3-GFP and the ER markers GFP-KDEL or Bip-sfGFP-HDEL, after which we calculated the Pearsons Coefficient, which indicated substantial colocalization (Figure 4B-G and Supplementary Figures 7 and 8). These new data are described in lines 196-199 of the revised version of the manuscript. To facilitate the visualization of the results, in the revised version of the manuscript we have included magnified cropped areas of the images shown in Figure 4A.

      (15) What is remarkable is a highly variable effect of different subunit KD on the percentage of cells with MLS (Fig. 4C). Controls = 100 %, Exo70=~75% (at 19 deg), Sec3 = ~30%, Sec10 = 0%, Exo84 = 100% ... This is interesting for the functional exocyst is an octameric holocomples, thus why the huge subunit variability in the phenotypes? The trivial explanation is either: i) variable exocyst subunit KD (not shown) or ii) variability between experiments (no error bars are shown). Both should be addressed by quantification of the KD of different proteins and secondly by replicating the experiments.

      We agree with the reviewer statement. We believe that both, variability of KD efficiency (i) and variability between experiments (ii) contribute to the variable effect observed after knocking down the different subunits. As detailed in the response to point 6), we have performed qRT-PCR determinations to confirm that the severity of the phenotype depends on the efficiency of RNAimediated silencing. We chose to analyse in detail the effect on the subunits exo70 and sec3, which were those with the highest phenotypic differences between the three silencing temperatures utilized. We found that as expected, the levels of silencing were temperaturedependent, being higher at 29°C and lower at 19°C. These data were included in Supplementary Figure 2, and described lines 153-159 of the Results section and also summarized in Author response images 4 and 5 of this rebuttal letter.

      We thank the reviewer for his/her comment on the replication of experiments and statistics. We failed to include detailed numerical information in the original submission, such as the number of replicas and standard deviations of the data depicted in Figure 3C and Supplementary Figure 1, so we apologize for this omission. In the revised version of the manuscript, we have included a table (Supplementary Table 3) in which all the raw data of Figure 3C and Supplementary Figure 1, including standard deviations, are now depicted.

      (16) If their data holds up then the underlying mechanism here needs to be considered.

      (Note: there is some precedent from the autophagy field of differential exocyst effects)

      Our proposed mechanism is essentially that the holocomplex is required for multiple processes along the secretory pathway. Each of these actions (Golgi structure maintenance, SG maturation and SG tethering/fusion with the plasma membrane) requires different amounts of holocomplex activity, being this the reason why each phenotype manifests at different levels of RNAi-mediated silencing (Author response image 4 of this letter). The model predicts that Golgi structure maintenance requires minimal levels of complex activity, and that is why strong knock-down of exocyst subunits is required to obtain this phenotype. In line with our results, it has been reported that other tethering complexes of the CATCHR family are also required for maintaining Golgi cisternae stuck together (D'Souza et al, 2020; Khakurel and Lupashin, 2023; Liu et al, 2019). One possibility is that the exocyst may play a redundant role in the maintenance of the normal structure of the Golgi complex, along with other CATCHR complexes. This potential redundancy could explain why severe exocyst knock-down is required to observe structural anomalies at this organelle. On the other end of the spectrum, we propose that tethering/fusion with the plasma membrane is very susceptible to even slight reduction of complex activity, so that mild RNAi-mediated silencing is sufficient to provoke defects in this process. This proposed model is depicted in Author response image 4 and discussed in lines 395-405 of the Discussion section. 

      (17) In the salivary glands the authors state that the exocyst is needed for Sgs3-GFP exit from the ER. First, Pearson's coefficient should be shown so as to quantitate the degree of ER localizations of all KDs.

      We thank the reviewer for this comment that helped us to strengthen the observation that when SG biogenesis is impaired, Sgs3-GFP remains trapped in the ER. In the revised version of the manuscript, we have calculated Pearson´s coefficient to assess colocalization between ER markers (GFP-KDEL or Bip-sfGFP-HDEL) and Sgs3-GFP in salivary gland cells that express sec15RNAi. The Pearson’s coefficient was around 0.6 for both ER markers, indicating that colocalization with Sgs3-GFP was substantial (Supplementary Figure 8, text lines 196-199 of the Results section).

      (18) Second, there should be some rescue performed (if possible) to support specificity. 

      As suggested by the reviewer, we have performed a rescue experiment of the phenotype provoked by the expression of sec15 RNAi, which consisted on the retention of Sgs3-GFP in the endoplasmic reticulum: Expression of Sec15-GFP reverted substantially the ER retention phenotype, rescuing SG biogenesis and also SG maturation in most cells (over 60% of the cells). These new data are now shown in Supplementary Figure 4, and described in lines 168-171 of the Results section.

      (19) Third, importantly other proteins that should traffic to the PM need to be shown to traffic normally so as to rule out a non-specific effect.

      We have addressed this issue (also mentioned by Reviewer #1), by analyzing the localization of a number of polarization markers, finding that the overall polarization of the cell was not affected by loss of function of exocyst subunits. Please, see our response to the point 3) raised by Reviewer #1. The new data showing cell polarization markers are shown in Supplementary Figure 6 of the revised version of the manuscript, and described on text lines 172-179 of the Results section.

      (20) It is unclear from their model (Fig. 5) why after exocyst KD of Sec15 the cis-Golgi is more preserved than the TGN, which appears as large vacuoles. This is not quantitated and not shown for the 8 subunits.

      We thank the reviewer for this relevant comment. We agree that the phenotype of either, sec15 or sec3 loss-of-function cells manifests differently with cis-Golgi and trans-Golgi markers. While the cis-Golgi marker looked fragmented and aggregated, the trans-Golgi marker adopted a swollen appearance. However, in our view, the different appearance of the two markers does not necessarily imply that one compartment is more preserved than the other. In the revised version of the manuscript, we have quantified the penetrance of the phenotypes provoked by sec15 or sec3 silencing, using both cis-Golgi and trans-Golgi markers. In both cases, the penetrance was high, although even higher with the trans-Golgi marker. These new data are now depicted in Supplementary Figure 9 of the revised version of the manuscript. 

      It is interesting to mention that in HeLa cells, as well as in the retinal epithelial cell line hTERT, Golgi phenotypes similar to those we have described here have been reported after loss-offunction of other tethering complexes, which were shown to maintain the Golgi cisternae stuck together, including the GOC and GARP complexes (D'Souza et al, 2020, Khakurel and Lupashin, 2023; Shijie Liu et al, 2019). As we did throughout our work, not every aspect of the analysis included the silencing of all eight subunits. In this case, we chose to silence Sec3 and Sec15. Please note that we have modified the model depicted in Figure 6E-F, to highlight the cis- and transGolgi phenotypes upon exocyst knock-down, as well as the localization of the exocyst in cisternae of the Golgi complex.

      (21) Acute/Chronic control: It would be nice to acutely block the exocyst so as to better distinguish if the effects observed are primary or secondary effects (e.g. on a recycling pathway).

      We thank the reviewer for raising this important issue. To address this point, and to be able to induce silencing of exocyst subunits at specific time intervals of larval development, we utilized a strategy based on a thermosensitive variant of the Gal4 inhibitor Gal80 (Gal80ts)(Lee and Luo, 1999). We blocked Gal4 activity (and therefore RNAi expression) by maintaining the larvae at 18 °C during the 1st and 2nd instars (until 120 hours after egg lay), and then induced the activity of Gal4 specifically at the 3rd larval instar by raising the temperature to 29 ºC, a condition in which Gal80ts becomes inactive. After silencing the expression of sec3 or sec15 at the 3rd larval instar only, the phenotype was very similar to that observed after chronic silencing of exocyst subunits (larvae maintained at 29 ºC all throughout development, where Gal4 was never inhibited). These observations suggest that the defects observed in the secretory pathway after knock down of exocyst subunits reflect genuine functions of the exocyst in this pathway, rather than a secondary effect derived from impaired development of the salivary glands at early larval stages. These new results are now shown in Supplementary Figure 3, and described in manuscript lines 160-171 of the Results section.   

      (22) Granule homotypic fusion. Strangely over-expression of just one subunit, Sec15-GFP, made giant secretory granules (SG) that were over 8 microns big! Why is that, especially if normally the exocyst is normally a holocomplex. Was this an effect that was specific to Sec15 or all exocyst subunits? Is the Sec15 level rate limiting in these cells? It may be that a subcomplex of Sec15/10 plays earlier roles, but in any case this needs to be addressed across all (or many) of the exocyst subcomplex members.

      Please, see our response to point 7) of this letter. Sec15 is believed to act as a seed for the formation of the whole complex.

      (23) In summary, there are clearly striking effects on secretory granule biogenesis by dysfunction of the exocyst, however right now it is hard to disentangle effects on ERGolgi traffic, loss of the TGN, and a problem in maturation or fusion of granules. 

      As discussed in detail in our response to the point 3 raised by Reviewer #1, the secretory pathway is highly synchronized in each of the cells of the Drosophila salivary gland. SG biogenesis, SG maturation and SG fusion with the plasma membrane never occur simultaneously in the same cell. Thus, in a cell in which ER-Golgi traffic is impaired (and SG biogenesis does not occur), SGs do not exist, and therefore, they cannot exhibit defects in the process of maturation or fusion with the plasma membrane. In summary, we believe that our work has shown that in Drosophila larval salivary glands the exocyst holocomplex is required for (at least) three functions along the secretory pathway: 1) To maintain the appropriate Golgi complex architecture, thus enabling ERGolgi transport; 2) For secretory granule maturation: both, homotypic fusion and acquisition of maturation factors; 3) For secretory granule exocytosis: secretory granule tethering to enable subsequent fusion with the plasma membrane. As mentioned above (point 6 of this letter), these three functions require different amounts of the holocomplex, and therefore can be revealed by inducing different levels of silencing.  

      (24) It is also confusing if the entire exocyst holocomplex or subcomplex plays a key role 

      The fact that, by silencing any of the subunits (with the appropriate conditions) it is possible obtain any of the 3 phenotypes (impaired SG biogenesis, impaired SG maturation or impaired SG fusion with the plasma membrane) argues in favour of a function of the complex as a whole in each of these three functions.

      Reviewer 3:

      (25) General comment: Freire and co-authors examine the role of the exocyst complex during the formation and secretion of mucins from secretory granules in the larval salivary gland of Drosophila melanogaster. Using transgenic lines with a tagged Sgs3 mucin the authors KD expression of exocyst subunit members and observe a defect in secretory granules with a heterogeneity of phenotypes. By carefully controlling RNAi expression using a Gal4-based system the authors can KD exocyst subunit expression to varying degrees. The authors find that the stronger the inhibition of expression of exocyst the earlier in the secretory pathway the defect. The manuscript is well written, the model system is physiological, and the techniques are innovative.

      We appreciate the reviewer´s assessment of our work. 

      (26) My major concern is that the evidence underlying the fundamental claim of the manuscript that "the exocyst complex participates" in multiple secretory processes lacks direct evidence.

      We thank the reviewer for raising this important issue. We believe that the analysis of Sec15 subcellular localization during salivary gland development (Figures 5, 7B-D and 9E-F), in combination with the detailed analysis of the phenotypes provoked by loss-of-function of each of the exocyst subunits, provide evidence supporting multiple functions of the exocyst in the secretory pathway. We have also included 3D reconstructions and videos of GFP-Sec15 colocalization with Golgi and SG markers to support exocyst localization associated to these structures (Supplementary Videos 1-7), text lines 200-210; 216-221 and 303-305.

      (27) It is clear from multiple lines of evidence, which are discussed by the authors, that exocyst is essential for an array of exocytic events. The fundamental concern is that loss of homeostasis on the plasma membrane proteome and lipidome might have severe pleiotropic effects on the cell.

      We agree with the reviewer that this is an important point that needed to be addressed. As discussed in detail above at the response to point 3 raised by Reviewer #1, we have analysed several plasma membrane markers (including a PI(4,5)P2 lipid reporter), and found that overall, plasma membrane integrity and polarity were not substantially affected (Supplementary Figure 6). In addition, we have analyzed several markers of general cellular “health” that indicate that salivary gland cells do not seem to be distressed by the reduction of exocyst complex activity (Supplementary Figure 5). These new data are described in lines 172-179 of the Results section.

      (28) Perhaps the authors have more evidence that exocyst is important for homeotypic fusion of the SGs, as supported by the localisation of Sec15 on the fusion sites.

      We believe that the fact that, by silencing any of the exocyst subunits (with the appropriate conditions), immature smaller-than-normal granules were observed, argus in favour that the exocyst as a whole participates in SG homofusion (Figure 7A). In addition, we have included more images, quantifications, 3D reconstructions and videos of GFP-Sec15 localized just at the contact sites between immature SGs. We have quantified and compared GFP-Sec15 localization at immature SG vs its localization at mature SGs, finding that localizes preferentially at immature SGs, supporting a role of the exocyst as a tethering complex during homotypic fusion (shown Figure 7B-C and Supplementary Videos 4-6, and described in lines 216-221 of the Results section). Please see also our response to the point 2 raised by reviewer 1 in this rebuttal letter, and to Author response image 3 above in this letter.

      (29) The second question that I think is important to address is, what exactly do the varying RNAi levels correspond to in terms of experiments, and have these been validated? Due to the fundamental claim being that the severity of the phenotype being correlated with the level of KD, I think validation of this model is absolutely essential.  

      We thank the Reviewer for raising this important point, and agree it was lacking in the original version of our manuscript. As discussed in our response to the point 6) raised by Reviewer #2, we have performed qRT-PCR determinations for exo70 and sec3 mRNA levels after inducing silencing of these subunits at different temperatures, or with different RNAi transgenic lines. The remnant mRNA levels correlate well with the observed phenotypes. Please see Supplementary Figure 2 of the revised manuscript, and Author response image 5 of this rebuttal letter; described in lines 155-159 of the Results section. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      -  The authors assert in the discussion that exocyst involvement in constitutive secretion is well documented. This is based on a very recent study in mammalian culture cells. Therefore, I would not dismiss the issue as completely settled. Furthermore, a previous study of Drosophila sec10 reported no roles outside the ring gland (DOI: 10.1034/j.1600-0854.2002.31206.x).

      We have included these observations in the Discussion section. Lines 326-329.

      -  A salivary gland screening by Julie Brill's lab reported exocyst components as hits (DOI: 10.1083/jcb.201808017).

      We have referred to this paper in the Discussion section. Lines 326-329.

      -  It should be explained in more detail what is measured in graphs 7C, F, and others quantifying fluorescence around secretory granules. Looking at the images, the decrease in Rab1 and Rab11 seems less convincing.

      We have made a clearer description of how fluorescence intensity was measured in the Methods section lines 558-561. Also, we have uploaded a source data file in which the raw data of each experiment used for quantifications are disclosed. 

      Please note that the data indicates that Rab11 levels are higher in sec5 (Figure 8J-L) and sec3 (supplementary Figure 11M-R).

      Reviewer #2 (Recommendations For The Authors):

      No major issues.

      Writing - The authors should better frame their interpretations of other studies of the exocyst that include the role in autophagy, Palade body trafficking, and differential roles of the subunits.

      We have discussed these specific points in the Discussion section, lines 348-355 and 409-410.

      Minor - Fig. 6A: Why are variable temperatures (19-29 deg C used for the 8 KD experiments)?

      Please show it all at the same temperature (control too).

      The need for the usage of specific temperatures to obtain specific phenotypes with each of the RNAi lines used was explained in point 6 of this letter.

      Reviewer #3 (Recommendations For The Authors):

      In the abstract, the authors refer to the exocytic process and go on to describe secretory granule biogenesis and exocytosis. However, there are many exocytic processes aside from secretory granule biogenesis, and I think the authors should clarify this.

      Corrected in the Abstract. Lines 19-21

      Page 17 Thomas, 2021 reference, there is a glitch with the reference.

      Thanks for noticing. Fixed.

      References

      Bhuin T, Roy JK. Developmental expression, co-localization and genetic interaction of exocyst component Sec15 with Rab11 during Drosophila development. Exp Cell Res. 2019 Aug 1;381(1):94-104. doi: 10.1016/j.yexcr.2019.04.038. Epub 2019 May 7. PMID: 31071318.

      D'Souza Z, Taher FS, Lupashin VV. Golgi inCOGnito: From vesicle tethering to human disease. Biochim Biophys Acta Gen Subj. 2020 Nov;1864(11):129694. doi: 10.1016/j.bbagen.2020.129694. Epub 2020 Jul 27. PMID: 32730773; PMCID: PMC7384418.

      Escrevente C, Bento-Lopes L, Ramalho JS, Barral DC. Rab11 is required for lysosome exocytosis through the interaction with Rab3a, Sec15 and GRAB. J Cell Sci. 2021 Jun 1;134(11):jcs246694. doi: 10.1242/jcs.246694. Epub 2021 Jun 8. PMID: 34100549; PMCID: PMC8214760.

      Guo W, Roth D, Walch-Solimena C, Novick P. The exocyst is an effector for Sec4p, targeting secretory vesicles to sites of exocytosis. EMBO J. 1999 Feb 15;18(4):1071-80. doi: 10.1093/emboj/18.4.1071. PMID: 10022848; PMCID: PMC1171198.

      Jafar-Nejad H, Andrews HK, Acar M, Bayat V, Wirtz-Peitz F, Mehta SQ, Knoblich JA, Bellen HJ. Sec15, a component of the exocyst, promotes notch signaling during the asymmetric division of Drosophila sensory organ precursors. Dev Cell. 2005 Sep;9(3):351-63. doi: 10.1016/j.devcel.2005.06.010. PMID: 16137928.

      Khakurel A, Lupashin VV. Role of GARP Vesicle Tethering Complex in Golgi Physiology. Int J Mol Sci. 2023 Mar 23;24(7):6069. doi: 10.3390/ijms24076069. PMID: 37047041; PMCID: PMC10094427.

      Lattner J, Leng W, Knust E, Brankatschk M, Flores-Benitez D. Crumbs organizes the transport machinery by regulating apical levels of PI(4,5)P2 in Drosophila. Elife. 2019 Nov 7;8:e50900. doi: 10.7554/eLife.50900. PMID: 31697234; PMCID: PMC6881148.

      Lee T, Luo L. Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999 Mar;22(3):451-61. doi: 10.1016/s08966273(00)80701-1. PMID: 10197526.

      Liu S, Majeed W, Grigaitis P, Betts MJ, Climer LK, Starkuviene V, Storrie B. Epistatic Analysis of the Contribution of Rabs and Kifs to CATCHR Family Dependent Golgi Organization. Front Cell Dev Biol. 2019 Aug 2;7:126. doi: 10.3389/fcell.2019.00126. PMID: 31428608; PMCID: PMC6687757.

      Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, Yang-Zhou D, Flockhart I, Binari R, Shim HS, Miller A, Housden A, Foos M, Randkelv S, Kelley C, Namgyal P, Villalta C, Liu LP, Jiang X, Huan-Huan Q, Wang X, Fujiyama A, Toyoda A, Ayers K, Blum A, Czech B, Neumuller R, Yan D, Cavallaro A, Hibbard K, Hall D, Cooley L, Hannon GJ, Lehmann R, Parks A, Mohr SE, Ueda R, Kondo S, Ni JQ, Perrimon N. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015 Nov;201(3):843-52. doi: 10.1534/genetics.115.180208. Epub 2015 Aug 28. PMID: 26320097; PMCID: PMC4649654.

      Wu S, Mehta SQ, Pichaud F, Bellen HJ, Quiocho FA. Sec15 interacts with Rab11 via a novel domain and affects Rab11 localization in vivo. Nat Struct Mol Biol. 2005 Oct;12(10):879-85. doi: 10.1038/nsmb987. Epub 2005 Sep 11. PMID: 16155582.

      Yeaman C, Grindstaff KK, Wright JR, Nelson WJ. Sec6/8 complexes on trans-Golgi network and plasma membrane regulate late stages of exocytosis in mammalian cells. J Cell Biol. 2001 Nov 12;155(4):593-604. doi: 10.1083/jcb.200107088. Epub 2001 Nov 5. PMID: 11696560; PMCID: PMC2198873.

      Zhang XM, Ellis S, Sriratana A, Mitchell CA, Rowe T. Sec15 is an effector for the Rab11 GTPase in mammalian cells. J Biol Chem. 2004 Oct 8;279(41):43027-34. doi: 10.1074/jbc.M402264200. Epub 2004 Jul 29. PMID: 15292201.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We summarized the main changes:

      (1) In the Introduction part, we give a general definition of habitat fragmentation to avoid confusion, as reviewers #1 and #2 suggested.

      (2) We clarify the two aspects of the observed “extinction”——“true dieback” and “emigration”, as reviewers #2 and #3 suggested.

      (3) In the Methods part, we 1) clarify the reason for testing the temporal trend in colonization/extinction dynamics and describe how to select islands as reviewer #1 suggested; 2) describe how to exclude birds from the analysis as reviewer #2 suggested.

      (4) In the Results part, we modified and rearranged Figure 4-6 as reviewers #1, #2 and #3 suggested.

      (5) In the Discussion part, we 1) discuss the multiple aspects of the metric of isolation for future research as reviewer #3 suggested; 2) provide concrete evidence about the relationship between habitat diversity or heterogeneity and island area and 3) provide a wider perspective about how our results can inform conservation practices in fragmented habitats as reviewer #2 suggested.

      eLife Assessment

      This important study enhances our understanding of how habitat fragmentation and climate change jointly influence bird community thermophilization in a fragmented island system. The evidence supporting some conclusions is incomplete, as while the overall trends are convincing, some methodological aspects, particularly the isolation metrics and interpretation of colonization/extinction rates, require further clarification. This work will be of broad interest to ecologists and conservation biologists, providing crucial insights into how ecosystems and communities react to climate change.

      We sincerely extend our gratitude to you and the esteemed reviewers for acknowledging the importance of our study and for raising these concerns. We have clarified the rationale behind our analysis of temporal trends in colonization and extinction dynamics, as well as the choice of distance to the mainland as the isolation metric. Additionally, we further discuss the multiple aspects of the metric of isolation for future research and provide concrete supporting evidence about the relationship between habitat diversity or heterogeneity and island area.

      Incorporating these valuable suggestions, we have thoroughly revised our manuscript, ensuring that it now presents a more comprehensive and nuanced account of our research. We are confident that these improvements will further enhance the impact and relevance of our work for ecologists and conservation biologists alike, offering vital insights into the resilience and adaptation strategies of communities facing the challenges of climate change.

      Reviewer #1 (Public Review):

      Summary:

      This study reports on the thermophilization of bird communities in a network of islands with varying areas and isolation in China. Using data from 10 years of transect surveys, the authors show that warm-adapted species tend to gradually replace cold-adapted species, both in terms of abundance and occurrence. The observed trends in colonisations and extinctions are related to the respective area and isolation of islands, showing an effect of fragmentation on the process of thermophilization.

      Strengths:

      Although thermophilization of bird communities has been already reported in different contexts, it is rare that this process can be related to habitat fragmentation, despite the fact that it has been hypothesized for a long time that it could play an important role. This is made possible thanks to a really nice study system in which the construction of a dam has created this incredible Thousand Islands lake. Here, authors do not simply take observed presence-absence as granted and instead develop an ambitious hierarchical dynamic multi-species occupancy model. Moreover, they carefully interpret their results in light of their knowledge of the ecology of the species involved.

      Response: We greatly appreciate your recognition of our study system and the comprehensive approach and careful interpretation of results. 

      Weaknesses:

      Despite the clarity of this paper on many aspects, I see a strong weakness in the authors' hypotheses, which obscures the interpretation of their results. Looking at Figure 1, and in many sentences of the text, a strong baseline hypothesis is that thermophilization occurs because of an increasing colonisation rate of warm-adapted species and extinction rate of cold-adapted species. However, there does not need to be a temporal trend! Any warm-adapted species that colonizes a site has a positive net effect on CTI; similarly, any cold-adapted species that goes extinct contributes to thermophilization.

      Thank you very much for these thoughtful comments. The understanding depends on the time frame of the study and specifically, whether the system is at equilibrium. We think your claim is based on this background: if the system is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. We agree with you in this case.

      On the other hand, if a community is at equilibrium, then there will be no net change in CTI over time. Imagine we have an archipelago where the average colonization of warm-adapted species is larger than the average colonization of cold-adapted species, then over time the archipelago will reach an equilibrium with stable colonization/extinction dynamics where the average CTI is stable over time. Once it is stable, then if there is a temporal trend in colonization rates, the CTI will change until a new equilibrium is reached (if it is reached).

      For our system, the question then is whether we can assume that the system is or has ever been at equilibrium. If it is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. If the system is at equilibrium (at the beginning of the study), then CTI will only shift if there is a temporal change or trend in colonization or extinction rates.

      Habitat fragmentation can affect biomes for decades after dam formation. The “Relaxation effect” (Gonzalez, 2000) refers to the fact that the continent acts as a potential species pool for island communities. Under relaxation, some species will be filtered out over time, mainly through the selective extinction of species that are highly sensitive to fragmentation. Meanwhile, for a 100-hectare patch, it takes about ten years to lose 50% of bird species; The smaller the patch area, the shorter the time required (Ferraz et al., 2003; Haddad et al., 2015). This study was conducted 50 to 60 years after the formation of the TIL, making the system with a high probability of reaching “equilibrium” through “Relaxation effect”(Si et al., 2014). We have no way of knowing exactly whether “equilibrium” is true in our system. Thus, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization, which makes our inference more robust.

      We add a note to the legend of Figure 1 on Lines 781-786:

      “CTI can also change simply due to differential colonization-extinction rates by thermal affinity if the system is not at equilibrium prior to the study. In our study system, we have no way of knowing whether our island system was at equilibrium at onset of the study, thus, focusing on changing rates of colonization-extinction over time presents a much stronger tests of thermophilization.”

      We hope this statement can make it clear. Thank you again for this meaningful question.

      Another potential weakness is that fragmentation is not clearly defined. Generally, fragmentation sensu lato involves both loss of habitat area and changes in the spatial structure of habitats (i.e. fragmentation per se). Here, both area and isolation are considered, which may be slightly confusing for the readers if not properly defined.

      Thank you for reminding us of that. Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We have clarified the general definition in the Introduction on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      This study addresses whether bird community reassembly in time is related to climate change by modelling a widely used metric, the community temperature index (CTI). The authors first computed the temperature index of 60 breeding bird species thanks to distribution atlases and climatic maps, thus obtaining a measure of the species realized thermal niche.

      These indices were aggregated at the community level, using 53 survey transects of 36 islands (repeated for 10 years) of the Thousand Islands Lake, eastern China. Any increment of this CTI (i.e. thermophilization) can thus be interpreted as a community reassembly caused by a change in climate conditions (given no confounding correlations).

      The authors show thanks to a mix of Bayesian and frequentist mixed effect models to study an increment of CTI at the island level, driven by both extinction (or emigration) of cold-adapted species and colonization of newly adapted warm-adapted species. Less isolated islands displayed higher colonization and extinction rates, confirming that dispersal constraints (created by habitat fragmentation per se) on colonization and emigration are the main determinants of thermophilization. The authors also had the opportunity to test for habitat amount (here island size). They show that the lack of microclimatic buffering resulting from less forest amount (a claim backed by understory temperature data) exacerbated the rates of cold-adapted species extinction while fostering the establishment of warm-adapted species.

      Overall these findings are important to range studies as they reveal the local change in affinity to the climate of species comprising communities while showing that the habitat fragmentation VS amount distinction is relevant when studying thermophilization. As is, the manuscript lacks a wider perspective about how these results can be fed into conservation biology, but would greatly benefit from it. Indeed, this study shows that in a fragmented reserve context, habitat amount is very important in explaining trends of loss of cold-adapted species, hinting that it may be strategic to prioritize large habitats to conserve such species. Areas of diverse size may act as stepping stones for species shifting range due to climate change, with small islands fostering the establishment of newly adapted warm-adapted species while large islands act as refugia for cold-adapted species. This study also shows that the removal of dispersal constraints with low isolation may help species relocate to the best suitable microclimate in a heterogenous reserve context.

      Thank you very much for your valuable feedback. We greatly appreciate your recognition of the scientific question to the extensive dataset and diverse approach. In particular, you provided constructive suggestions and examples on how to extend the results to conservation guidance. This is something we can’t ignore in the manuscript. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      ‘Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.’

      Strength:

      The strength of the study lies in its impressive dataset of bird resurveys, that cover 10 years of continued warming (as evidenced by weather data), 60 species in 36 islands of varying size and isolation, perfect for disentangling habitat fragmentation and habitat amount effects on communities. This distinction allows us to test very different processes mediating thermophilization; island area, linked to microclimatic buffering, explained rates for a variety of species. Dispersal constraints due to fragmentation were harder to detect but confirms that fragmentation does slow down thermophilization processes.

      This study is a very good example of how the expected range shift at the biome scale of the species materializes in small fragmented regions. Specifically, the regional dynamics the authors show are analogous to what processes are expected at the trailing and colonizing edge of a shifting range: warmer and more connected places display the fastest turnover rates of community reassembly. The authors also successfully estimated extinction and colonization rates, allowing a more mechanistic understanding of CTI increment, being the product of two processes.

      The authors showed that regional diversity and CTI computed only by occurrences do not respond in 10 years of warming, but that finer metrics (abundance-based, or individual islands considered) do respond. This highlights the need to consider a variety of case-specific metrics to address local or regional trends. Figure Appendix 2 is a much-appreciated visualization of the effect of different data sources on Species thermal Index (STI) calculation.

      The methods are long and diverse, but they are documented enough so that an experienced user with the use of the provided R script can follow and reproduce them.

      Thank you very much for your profound Public Review. We greatly appreciate your recognition of the scientific question, the extensive dataset and the diverse approach. 

      Weaknesses:

      While the overall message of the paper is supported by data, the claims are not uniformly backed by the analysis. The trends of island-specific thermophilization are very credible (Figure 3), however, the variable nature of bird observations (partly compensated by an impressive number of resurveys) propagate a lot of errors in the estimation of species-specific trends in occupancy, abundance change, and the extinction and colonization rates. This materializes into a weak relationship between STI and their respective occupancy and abundance change trends (Figure 4a, Figure 5, respectively), showing that species do not uniformly contribute to the trend observed in Figure 3. This is further shown by the results presented in Figure 6, which present in my opinion the topical finding of the study. While a lot of species rates response to island areas are significant, the isolation effect on colonization and extinction rates can only be interpreted as a trend as only a few species have a significant effect. The actual effect on the occupancy change rates of species is hard to grasp, and this trend has a potentially low magnitude (see below).

      Thank you very much for pointing out this shortcoming. The R2 between STI and their respective occupancy trends is relatively small (R2\=0.035). But the R2 between STI and their respective abundance change trends are relatively bigger, in the context of Ecology research (R2\=0.123). The R2 between STI and their respective colonization rate (R2\=0.083) and extinction rate trends (R2\=0.053) are also relatively small. Low R2 indicates that we can’t make predictions using the current model, we must notice that except STI, other factors may influence the species-specific occupancy trend. Nonetheless, it is important to notice that the standardized coefficient estimates are not minor and the trend is also significant, indicating the species-specific response is as least related to STI.

      The number of species that have significant interaction terms for isolation (Figure 6) is indeed low. Although there is uncertainty in the estimation of relationships, there are also consistent trends in response to habitat fragmentation of colonization of warm-adapted species and extinction of cold-adapted species. This is especially true for the effect of isolation, where on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate. We now better highlight these results in the Results and Discussion.

      While being well documented, the myriad of statistical methods used by the authors ampere the interpretation of the figure as the posterior mean presented in Figure 4b and Figure 6 needs to be transformed again by a logit-1 and fed into the equation of the respective model to make sense of. I suggest a rewording of the caption to limit its dependence on the method section for interpretation.

      Thank you for this suggestion. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable so interpretation is actually quite straight forward: positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects...”

      By using a broad estimate of the realized thermal niche, a common weakness of thermophilization studies is the inability to capture local adaptation in species' physiological or behavioral response to a rise in temperature. The authors however acknowledge this limitation and provide specific examples of how species ought to evade high temperatures in this study region.

      We appreciate your recognition. This is a common problem in STI studies. We hope in future studies, researchers can take more details about microclimate of species’ true habitat across regions into consideration when calculating STI. Although challenging, focusing on a smaller portion of its distribution range may facilitate achievement.

      Reviewer #3 (Public Review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase in the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well as the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence-based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) were stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only a few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well-balanced method of simplifying this to the most important factors in question (CTI change, extinction, and colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      We appreciate very much for your positive and constructive comments and suggestions. Thank you for your recognition of the scientific question, the modeling approach and the conclusions. 

      Weaknesses:

      The metric of island isolation based on the distance to the mainland seems a bit too oversimplified as in real life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Thus a more holistic network metric of isolation could have been applied or at least discussed for future research. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint at a more complex pattern going on in real-life than was assumed for this study.

      Thank you for this meaningful question. Isolation can be measured in different ways in the study region. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate (Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This could be the reason why distance to the nearest mainland is the best predictor.

      We agree with you that it’s still necessary to consider more aspects of “isolation” at least in discussion for future research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      Further, the link between larger areas and higher habitat diversity or heterogeneity could be presented by providing evidence for this relationship. The authors do make a reference to a paper done in the same study system, but a more thorough presentation of it would strengthen this assumption further.

      Thank you very much for this question. We now add more details about the relationship between habitat diversity and heterogeneity based on a related study in the same system. The observed number of species significantly increased with increasing island area (slope = 4.42, R2 = 0.70, p < .001), as did the rarefied species richness per island (slope = 1.03, R2 = 0.43, p < .001), species density (slope = 0.80, R2 = 0.33, p = .001) and the rarefied species richness per unit area (slope = 0.321, R2 = 0.32, p = .001). We added this supporting evidence on Lines 317-321:

      “We thus suppose that habitat heterogeneity could also mitigate the loss of these relatively cold-adapted species as expected. Habitat diversity, including the observed number of species, the rarefied species richness per island, species density and the rarefied species richness per unit area, all increased significantly with island area instead of isolation in our system (Liu et al., 2020)”

      Despite the general clear patterns found in the paper, there were some idiosyncratic responses. Those could be due to a multitude of factors which could be discussed a bit better to inform future research using a similar study design.

      Thank you for these suggestions. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1: I disagree that there should be a temporal trend in colonisation/extinction dynamics.

      Thank you again for these thoughtful comments. We have explained in detail in the response to the Public Review.

      (2) L 485-487: As explained before I disagree. I don't see why there needs to be a temporal trend in colonization and extinction.

      Thank you again for these thoughtful comments. Because we can’t guarantee that the study system has reached equilibrium, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization. More detailed statement can be seen in the response to the Public Review.

      (3) L 141: which species' ecological traits?

      Sorry for the confusion. The traits included continuous variables (dispersal ability, body size, body mass and clutch size) and categorical variables (diet, active layer, residence type). Specifically, we tested the correlation between STI and dispersal ability, body size, body mass and clutch size using Pearson correlation test. We also tested the difference in STI between different trait groups using the Wilcoxon signed-rank test for three Category variables: diet (carnivorous/ omnivorous/ herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor). There is no significant difference between any two groups for each of the three category variables (p > 0.2). We added these on Lines 141-145:

      “No significant correlation was found between STI and species’ ecological traits; specifically, the continuous variables of dispersal ability, body size, body mass and clutch size (Pearson correlations for each, |r| < 0.22), and the categorial variables of diet (carnivorous/omnivorous/herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor)”

      (4) L 143: CTIoccur and CTIabun were not defined before.

      Because CTIoccur and CTIabun were first defined in Methods part (section 4.4), we change the sentence to a more general statement here on Lines 147-150:

      “At the landscape scale, considering species detected across the study area, occurrence-based CTI (CTIoccur; see section 4.4) showed no trend (posterior mean temporal trend = 0.414; 95% CrI: -12.751, 13.554) but abundance-based CTI (CTIabun; see section 4.4) showed a significant increasing trend.”

      (5) Figure 4: what is the dashed vertical line? I assume the mean STI across species?

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (6) Figure 6: in the legend, replace 'points in blue' with 'points in blue/orange' or 'solid dots' or something similar.

      Thank you for this suggestion. We changed it to “points in blue/orange” on Lines 823.

      (7) L 176-176: unclear why the interaction parameters are particularly important for explaining the thermophilization mechanism: if e.g. colonization rate of warm-adapted species is constantly higher in less isolated islands, (and always higher than the extinction rate of the same species), it means that thermophilization is increased in less isolated islands, right?

      Thank you for this question. This is also related to the question about “Why use temporal trends in colonization/extinction rate to test for thermophilization mechanisms”. Colonization-extinction over time is actually a much stronger test of thermophilization (more details refer to response to Public Review and Recommendations 1&2).

      Based on this, the two main driving processes of thermophilization mechanism include the increasing colonization rate of warm-adapted species and the increasing extinction rate of cold-adapted species with year. The interaction effect between island area (or isolation) and year on colonization rate (or extinction rate) can tell us how habitat fragmentation mediates the year effect. For example, if the interaction term between year and isolation is negative for a warm-adapted species that increased in colonization rate with year, it indicates that the colonization rate increased faster on less isolated islands. This is a signal of a faster thermophilization rate on less-isolated islands.

      (8) L201-203: this is only little supported by the results that actually show that there is NO significant interaction for most species.

      Thank you for this comment. Although most species showed non-significant interaction effect, the overall trend is relatively consistent, this is especially true for the effect of isolation. To emphasize the “trend” instead of “significant effect”, we slightly modified this sentence in more rigorous wording on Lines 205-208: 

      “We further found that habitat fragmentation influences two processes of thermophilization: colonization rates of most warm-adapted species tended to increase faster on smaller and less isolated islands, while the loss rates of most cold-adapted species tended to be exacerbated on less isolated islands.”

      (9) Section 2.3: can't you have a population-level estimate? I struggled a bit to understand all the parameters of the MSOM (because of my lack of statistical/mathematical proficiency) so I cannot provide more advice here.

      Thank you for raising this advice. We think what you are mentioning is the overall estimate across all species for each variable. From MSOM, we can get a standardized estimate of every variable (year, area, isolation, interaction) for each species, separately. Because the divergent or consistent responses among species are what we are interested in, we didn’t calculate further to get a population-level estimate.

      (10) L 291: a dot is missing.

      Done. Thank you for your correction.

      (11) L 305, 315: a space is missing

      Done

      (12) L 332: how were these islands selected?

      Thank you for this question. The 36 islands were selected according to a gradient of island area and isolation, spreading across the whole lake region. The selected islands guaranteed there is no significant correlation between island area and isolation (the Pearson correlation coefficient r = -0.21, p = 0.21). The biggest 7 islands among the 36 islands are also the only several islands larger than 30 ha in the whole lake region. We have modified this in the Method part on Lines 360-363.

      “We selected 36 islands according to a gradient of island area and isolation with a guarantee of no significant correlation between island area and isolation (Pearson r = -0.21, p = 0.21). For each island, we calculated island area and isolation (measured in the nearest Euclidean distance to the mainland) to represent the degree of habitat fragmentation.”

      (13) L 334: "Distance to the mainland" was used as a metric of isolation, but elsewhere in the text you argue that the observed thermophilization is due to interisland movements. It sounds contradictory. Why not include the average or shortest distance to the other islands?

      Thank you very much for raising this comment. Yes, “Distance to the mainland” was the only metric we used for isolation. We carefully checked through the manuscript where the “interisland movement” comes from and induces the misunderstanding. It must come from Discussion 3.1 (n Lines 217-221): “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to inter-island occurrence dynamics, rather than exogenous community turnover.”

      Sorry, the word “inter-island” is not exactly what we want to express here, we wanted to express that “the thermophilization was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region”. We have changed the sentence in Discussion part on Lines 217-221:

      “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region.”

      Besides, I would like to explain why we use distance to the mainland. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate(Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This may be the reason why distance to the nearest mainland is the best predictor.

      In Discussion part, we added the following discussion and talked about the other measures on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (14) L 347: you write 'relative' abundance but this measure is not relative to anything. Better write something like "we based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys".

      Thank you for this suggestion, we have changed the sentence on Lines 377-379:

      “We based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys.”

      (15) L 378: shouldn't the formula for CTIoccur be (equation in latex format):

      CTI{occur, j, t} =\frac{\sum_{i=1}^{N_{j,t}}STI_{i}}{N_{j,t}}

      Where Nj,t is the total number of species surveyed in the community j in year t

      Thank you very much for this careful check, we have revised it on Lines 415, 417:

      “where Nj,t is the total number of species surveyed in the community j in year t.”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 76: "weakly"

      Done. Thank you for your correction.

      (2) Line 98: I suggest a change to this sentence: "For example, habitat fragmentation renders habitats to be too isolated to be colonized, causing sedentary butterflies to lag more behind climate warming in Britain than mobile ones"

      Thank you for this modification, we have changed it on Lines 99-101.

      (3) Line 101: remove either "higher" or "increasing"

      Done, we have removed “higher”. Thank you for this advice.

      (4) Line 102: "benefiting from near source of"

      Done.

      (5) Line 104: "emigrate"

      Done.

      (6) Introduction: I suggest making it more explicit what process you describe under the word "extinction". At first read, I thought you were only referring to the dieback of individuals, but you also included emigration as an extinction process. It also needs to be reworded in Fig 1 caption.

      Thank you for this suggestion. Yes, we can’t distinguish in our system between local extinction and emigration. The observed “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then if can’t emigrate or withstand, “real local dieback”. It should also be included in the legend of Figure 1, as you said. We have modified the legend in Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, and if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      (7) I also suggest differentiating habitat fragmentation (distances between islands) and habitat amount (area) as explained in Fahrig 2013 (Rethinking patch size and isolation effects: the habitat amount hypothesis) and her latter paper. This will help the reader what lies behind the general trend of fragmentation: fragmentation per se and habitat amount reduction.

      Thank you for this suggestion! Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We now give a general definition of habitat fragmentation on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (8) Line 136: is the "+-" refers to the standard deviation or confidence interval, I suggest being explicit about it once at the start of the results.

      Thank you for reminding this. The "+-" refers to the standard deviation (SD). The modified sentence is now on Lines 135-139:

      “The number of species detected in surveys on each island across the study period averaged 13.37 ± 6.26 (mean ± SD) species, ranging from 2 to 40 species, with an observed gamma diversity of 60 species. The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of STI is 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (9) Line 143: please specify the unit of thermophilization.

      The unit of thermophilization rate is the change in degree per unit year. Because in all analyses, predictor variables were z-transformed to make their effect comparable. We have added on Line 151:

      “When measuring CTI trends for individual islands (expressed as °/ unit year)”

      (10) Line 289: check if no word is missing from the sentence.

      The sentence is: “In our study, a large proportion (11 out of 15) of warm-adapted species increasing in colonization rate and half (12 out of 23) of cold-adapted species increasing in extinction rate were changing more rapidly on smaller islands.”

      Given that we have defined the species that were included in testing the third prediction in both Methods part and Result part: 15 warm-adapted species that increased in colonization rate and 23 cold-adapted species that increased in extinction rate. We now remove this redundant information and rewrote the sentence as below on Lines 300-302:

      “In our study, the colonization rate of a large proportion of warm-adapted species (11 out of 15) and the extinction rate of half of old-adapted species (12 out of 23) were increasing more rapidly on smaller islands.”

      (11) Line 319: I really miss a concluding statement of your discussion, your results are truly interesting and deserve to be summarized in two or three sentences, and maybe a perspective about how it can inform conservation practices in fragmented settings.

      Thank you for this profound suggestion both in Public Review and here. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      “Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.”

      (12) Line 335: I suggest " ... the islands has been protected by forbidding logging, ..."

      Thanks for this wonderful suggestion. Done. The new sentence is now on Lines 365-366:

      “Since lake formation, the islands have been protected by forbidding logging, allowing natural succession pathways to occur.”

      (13) Line 345: this speed is unusually high for walking, check the speed.

      Sorry for the carelessness, it should be 2.0 km/h. It has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (14) Line 351: you could add a sentence explaining why that choice of species exclusion was made. Was made from the start of the monitoring program or did you exclude species afterward?

      We excluded them afterward. We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants). These records were recorded during monitoring, including some of them being on the shore of the island or high-flying above the island, and some nocturnal species were just spotted by accident.

      We described more details about how to exclude species on Lines 379-387:

      “We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants) from our record. First, our surveys were conducted during the day, so some nocturnal and crepuscular species, such as the owls and nightjars were excluded for inadequate survey design. Second, wagtail, kingfisher, and water birds such as ducks and herons were excluded because we were only interested in forest birds. Third, birds like swallows, and eagles who were usually flying or soaring in the air rather than staying on islands, were also excluded as it was difficult to determine their definite belonging islands. Following these operations, 60 species were finally retained.”

      (15) Line 370: I suggest adding the range and median of STI.

      Thanks for this good suggestion. The range, mean±SD of STI were already in the Results part, we added the median of STI there as well. The new sentence is now in Results part on Lines 137-139:

      “The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (16) Figure 4.b: Is it possible to be more explicit about what that trend is? the coefficient of the regression Logit(ext/col) ~ year + ...... ?

      Thank you for this advice. Your understanding is right: we can interpret it as the coefficient of the ‘year’ effect in the model. More specifically, the ‘year’ effect or temporal trend here is the ‘posterior mean’ of the posterior distribution of ‘year’ in the MSOM (Multi-species Occupancy Model), in the context of the Bayesian framework. We modified this sentence on Lines 811-813:

      “ Each point in (b) represents the posterior mean estimate of year in colonization, extinction or occupancy rate for each species.”

      (17) Figure 6: is it possible to provide an easily understandable meaning of the prior presented in the Y axis? E.g. "2 corresponds to a 90% probability for a species to go extinct at T+1", if not, please specify that it is the logit of a probability.

      Thank you for this question both in Public Review and here. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable. So, positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects.”

      (18) Line 773: points in blue only are significant? I suggest "points in color".

      Thank you for your reminder. Points in blue and orange are all significant. We have revised the sentence on Line 823:

      “Points in blue/orange indicate significant effects.”

      These are all small suggestions that may help you improve the readability of the final manuscript. I warmly thank you for the opportunity to review this impressive study.

      We appreciate your careful review and profound suggestions. We believe these modifications will improve the final manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I have a few minor suggestions for paper revision for your otherwise excellent manuscript. I wish to emphasize that it was a pleasure to read the manuscript and that I especially enjoyed a very nice flow throughout the ms from a nicely rounded introduction that led well into the research questions and hypotheses all the way to a good and solid discussion.

      Thank you very much for your review and recognition. We have carefully checked all recommendations and addressed them in the manuscript.

      (1) L 63: space before the bracket missing and I suggest moving the reference to the end of the sentence (directly after habitat fragmentation does not seem to make sense).

      Thank you very much for this suggestion. The missed space was added, and the reference has been moved to the end of the sentence. We also add a general definition of habitat fragmentation. The new sentence is on Lines 61-64:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (2) L 102: I suggest to write "benefitting ..." instead.

      Done.

      (3) L 103: higher extinction rates (add "s").

      Done.

      (4) L 104: this should probably say "emigrate" and "climate warming".

      Done.

      (5) L 130-133: this is true for emigration (more isolated islands show slower emigration). But what about increased local extinction, especially for small and isolated islands? Especially since you mentioned later in the manuscript that often emigration and extinction are difficult to identify or differentiate. Might be worth a thought here or somewhere in the discussion?

      Thank you for this good question. I would like to answer it in two aspects:

      Yes, we can’t distinguish between true local extinction and emigration. The observed local “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then, if can’t emigrate or withstand, “real local dieback”. Over 10 years, the cold-adapted species would have to tolerate before real extinction on remote islands because of disperse limitation, while on less isolated islands it would be easy to emigrate and find a more suitable habitat for the same species. Consequently, it’s harder for us to observe “extinction” of species on more isolated islands, while it’s easier to observe “fake extinct” of species on less isolated islands due to emigration. As a result, the observed extinction rate is expected to increase more sharply for species on less remote islands, while the observed extinction rate is expected to increase relatively moderately for the same species on remote islands.

      We have modified the legend of Figure 1 on Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      Besides, you said “But what about increased local extinction, especially for small and isolated islands?”, I think you are mentioning the “high extinction rate per se on remote islands”. We want to test the “trend” of extinction rate on a temporal scale, rather than the extinction rate per se on a spatial scale. Even though species have a high extinction rate on remote islands, it can also show a slower changing rate in time.

      I hope these answers solve the problem.

      (6) L 245: I think this is the first time the acronym appears in the ms (as the methods come after the discussion), so please write the full name here too.

      Thank you for pointing out this. I realized “Thousand Island Lake” appears for the first time in the last paragraph of the Introduction part. So we add “TIL” there on Lines 108-109:

      “Here, we use 10 years of bird community data in a subtropical land-bridge island system (Thousand Island Lake, TIL, China, Figure 2) during a period of consistent climatic warming.”

      (7) L 319: this section could end with a summary statement on idiosyncratic responses (i.e. some variation in the responses you found among the species) and the potential reasons for this, such as e.g. the role of other species traits or interactions, as well as other ways to measure habitat fragmentation (see main comments in public review).

      Thank you for this suggestion both in Public Review and here. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      We only strengthen “habitat loss” here, because idiosyncratic responses mainly come from the mediating effect of habitat loss. For the mediating effect of isolation, the response is relatively consistent (see Page 8, Lines 183-188): “In particular, the effect of isolation on temporal dynamics of thermophilization was relatively consistent across cold- and warm-adapted species (Figure 5a, b); specifically, on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate”.

      (8) L 333: what about the distance to other islands? it's more of a network than a island-mainland directional system (Figure 2). You could address this aspect in the discussion.

      Thank you for this good question again. Isolation can be measured in different ways in the study region. We chose distance to the mainland because it was the best predictor of colonization and extinction rate of breeding birds in the study region, and produced similar results like the other distance-based measures, including distance to the nearest landmass, distance to the nearest larger landmass (Si et al., 2014). We still agree with you that it’s necessary to consider more aspects of “isolation” at least in discussion for future research. In Discussion part, we addressed these on Lines 292-299. For more details refer to the response to Public Review.

      (9) Figure 2: Is B1 one of the sampled islands? It is clearly much larger than most other islands and I think it could thus serve as an important population source for many of the adjacent smaller islands? Thus, the nearest neighbor distance to B1 could be as important in addition to the distance to the mainland?

      Yes, B1 is one of the sampled islands and is also the biggest island. In previous research in our study system, we tried distance to the nearest landmass, to the nearest larger landmass and the nearest mainland, they produced similar results (For more details refer to the response to Public Review). We agree with you that the nearest neighbor distance to B1 could be a potentially important measure, but need further research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (10) L 345: 20km/h walking seems impressively fast? I assume this is a typo.

      Sorry for the carelessness, it should be 2.0 km/h. it has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (11) L 485: I had difficulties fully understanding the models that were fitted here and could not find them in the codes you provided (which were otherwise very well documented!). Could you explain this modeling step in a bit more detail?

      Thank you for your recognition! According to Line 485 in the online PDF version (Methods part 4.6.3), it says: “An increasing colonization trend of warm-adapted species and increasing extinction trend of cold-adapted species are two main expected processes that cause thermophilization (Fourcade et al., 2021). To test our third prediction about the mediating effect of habitat fragmentation, we selected warm-adapted species that had an increasing trend in colonization rate (positive year effect in colonization rate) and cold-adapted species that had an increasing extinction rate (positive year effect in extinction rate)…..”

      We carefully checked the code in Figshare link and found that the MOSM JAGS code was not uploaded before. Very sorry for that. Now it can be found in the document [MOSM.R] at https://figshare.com/s/7a16974114262d280ef7. Hope the code, together with the modeling process in section 4.5 in the Methods can help to understand the whole modeling process. Besides, we would like to explain how to decide the temporal trend in colonization or extinction of each species related to Line 485. Let’s take the model of species-specific extinction rate for example:

      In this model, “Island” was a random effect, “Year” is added as a random slope, thus allowing “year effect” (that is: the temporal trend) of extinction rate of species to vary with “island”. Further, the interaction effect between island variables (isolation, area) was added to test if the “year effect” was related to island area or isolation.

      Because we are only interested in warm-adapted species that have a positive temporal trend in colonization and cold-adapted species that have a positive temporal trend in extinction, which are two main processes underlying thermophilizaiton, we choose warm-adapted species that have a positive year-effect in colonization, and cold-adapted species that has a positive year-effect in extinction. Hope this explanation and the JAGS code can help if you are confused about this part.

      Hope these explanations can make it clearer.

      (12) Figure 1: to me, it would be more intuitive to put the landscape configuration in the titles of the panels b, c, and d instead of "only" the mechanisms. E.g. they could be: a) fragmented islands with low climate buffering; b) small islands with low habitat heterogeneity; c) isolated islands with dispersal limitations?

      It is also slightly confusing that the bird communities are above "island" in the middle of the three fragmented habitats - which all look a bit different in terms of tree species and structure which makes the reader first think that it has something to do with the "new" species community. so maybe worth rethinking how to illustrate the three fragmented islands?

      We would like to thank you for your nice proposition. Firstly, it’s a good idea to put the landscape configuration in the title of the panels b, c, d. The new title (a) is “Fragmented islands with low climate buffering”, title (b) is “Small islands with low habitat heterogeneity”, and title (c) is “Isolated patches with dispersal limitations”.

      Second, we realized that putting the “bird community” above “island” in the middle of the three patches is a bit confusing. Actually, we wanted to show bird communities only on that one island in the middle. The other two patches are only there to represent a fragmented background. To avoid misunderstanding, we added a sentence in the legend of Figure 1 on Lines 778-780:

      “The three distinct patches signify a fragmented background and the community in the middle of the three patches was selected to exhibit colonization-extinction dynamics in fragmented habitats.”

      (13) Figure 4: please add the description of the color code for panel a.

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (14) Figure 5: You could consider adding this as panel c to Figure 4 as it depicts the same thing as in 4a but for CTI-abundance.

      Thank you for this advice. We have moved the original Figure 5 to Figure 4c. Previous Figure 6 thus turned into Figure 5. All corresponding citations in the main text were checked to adapt to the new index. The new figure is now on Lines 801-815:

      References

      Ferraz, G., Russell, G. J., Stouffer, P. C., Bierregaard Jr, R. O., Pimm, S. L., & Lovejoy, T. E. (2003). Rates of species loss from Amazonian forest fragments. Proceedings of the National Academy of Sciences, 100(24), 14069-14073. doi:10.1073/pnas.2336195100

      Fourcade, Y., WallisDeVries, M. F., Kuussaari, M., van Swaay, C. A., Heliölä, J., & Öckinger, E. (2021). Habitat amount and distribution modify community dynamics under climate change. Ecology Letters, 24(5), 950-957. doi:10.1111/ele.13691

      Gaüzère, P., Princé, K., & Devictor, V. (2017). Where do they go? The effects of topography and habitat diversity on reducing climatic debt in birds. Global Change Biology, 23(6), 2218-2229. doi:10.1111/gcb.13500

      Gonzalez, A. (2000). Community relaxation in fragmented landscapes: the relation between species richness, area and age. Ecology Letters, 3(5), 441-448. doi:10.1046/j.1461-0248.2000.00171.x

      Haddad, N. M., Brudvig, L. A., Clobert, J., Davies, K. F., Gonzalez, A., Holt, R. D., . . . Collins, C. D. (2015). Habitat fragmentation and its lasting impact on Earth’s ecosystems. Science advances, 1(2), e1500052. doi:10.1126/sciadv.1500052

      Richard, B., Dupouey, J. l., Corcket, E., Alard, D., Archaux, F., Aubert, M., . . . Macé, S. (2021). The climatic debt is growing in the understorey of temperate forests: Stand characteristics matter. Global Ecology and Biogeography, 30(7), 1474-1487. doi:10.1111/geb.13312

      Si, X., Pimm, S. L., Russell, G. J., & Ding, P. (2014). Turnover of breeding bird communities on islands in an inundated lake. Journal of Biogeography, 41(12), 2283-2292. doi:10.1111/jbi.12379

    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is a thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems.

      We thank the reviewers for these positive comments.

      Weaknesses:

      Characterization of the behavioral effects of manipulations of these PPN input circuits could be further parsed, for a better understanding of the functional consequences of the connections demonstrated in the ephys analyses.

      We will further analyze our behavioral data to reveal more nuanced functional effects.

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for how the locomotion and valence effects are demonstrated here.

      This is a really interesting future direction and we will expand on these points in the discussion.

      Reviewer #2 (Public review):

      Summary:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN.

      Strengths:

      The slice electrophysiology work is technically well done and provides useful information for further studies of PPN. The optogenetics and behavioral studies are thought-provoking, showing that SNr and GPe projections to PPN play distinct roles in behavior.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      Although the optogenetics and behavioral studies are intriguing, they are somewhat difficult to fit together into a specific model of circuit function. Perhaps the authors can work to solidify the connection between these two arms of the work.

      We will expand on these topics in the discussion.

      (1) Male and female mice are used, but the authors do not discuss any analysis of sex differences. If there are no sex differences, it is still useful to report data disaggregated by sex in addition to pooled data.

      While we do not have sufficient n for a well-powered analysis of sex differences in behavior, we find that both male and female mice increase movement in response to SNr axon stimulation and decrease movement in response to GPe axon stimulation. We will expand on this further in the revised manuscript.

      (2) There is some lack of clarity in the current manuscript on the ages used - 2-5 months vs "at least 7 weeks." Is 7 weeks the time of virus injection surgery, then recordings 3 weeks later (at least 10 weeks)? Please clarify if these ages apply equally to electrophysiological and behavioral studies. If the age range used for the test is large, it may be useful to analyze and report if there are age-related effects.

      7 weeks is the youngest age at which mice used for electrophysiology were injected, and all were used for electrophysiology between 2-5 months. For behavior, the youngest mice used were 11 weeks old at time of behavior (8 weeks old at injection). Mice in the GPe-stimulated condition were 110 ± 7.4 SEM days old and mice in the SNr-stimulated condition 132 ± 23.4 SEM days old. We will add these details to the revised manuscript.

      In addition, we have correlated distance traveled at baseline and during stimulation with age for both SNr and GPe stimulated conditions. Baseline distance traveled did not correlate with age, but there was a trend toward more movement during stimulation with older mice in the SNr axon stimulation group. We will discuss this in the revised manuscript.

      (3) Were any exclusion criteria applied, e.g. to account for missed injections?

      All injection sites and implant sites were within our range of acceptability, so we did not exclude any mice for missed injections.

      (4) 28-34degC is a fairly wide range of temperatures for electrophysiological recording, which could affect kinetics.

      This is an important consideration. We have checked our main measurement of current amplitude in the condition where we found significant differences between rostral and caudal PPN (SNr to Vglut2 PPN neurons) against temperature and found no correlation (Pearson’s r value = -0.0076). Similarly, we found no correlation between baseline (pre-opto) firing frequency and temperature (r = -0.068).

      (5) It would be good to report the number of mice used for each condition in addition to n=cells. Statistically, it would be preferable not to assume that each cell from the same mouse is an independent measurement and to use a nested ANOVA.

      For electrophysiology, the number of mice used in each experiment was 6 (3 male, 3 female). In the manuscript ‘N’ represents number of mice and ‘n’ represents number of cells. Because of the unpredictability of how many healthy cells can be recorded from one mouse, our data were planned to be collected with n=cells, and are underpowered for a nested ANOVA. However, rostral and caudal data were collected from the same mice. While we do not have sufficient paired data for each parameter, analyzing one of our main and most important findings with a paired comparison (with biological replicates being mice) shows a statistically significant difference in the inhibitory effect of SNr axon stimulation on firing rate between rostral and caudal glutamatergic neurons (p=0.031, Wilcoxon signed rank test).

      Reviewer #3 (Public review):

      Summary:

      The study by Fallah et al provides a thorough characterization of the effects of two basal ganglia output pathways on cholinergic, glutamatergic, and GABAergic neurons of the PPN. The authors first found that SNr projections spread over the entire PPN, whereas GPe projections are mostly concentrated in the caudal portion of the nucleus. Then the authors characterized the postsynaptic effects of optogenetically activating these basal ganglia inputs and identified the PPN's cell subtypes using genetically encoded fluorescent reporters. Activation of inputs from the SNr inhibited virtually all PPN neurons. Activation of inputs from the GPe predominantly inhibited glutamatergic neurons in the caudal PPN, and to a lesser extent GABAergic neurons. Finally, the authors tested the effects of activating these inputs on locomotor activity and place preference. SNr activation was found to increase locomotor activity and elicit avoidance of the optogenetic stimulation zone in a real-time place preference task. In contrast, GPe activation reduced locomotion and increased the time in the RTPP stimulation zone.

      Strengths:

      The evidence of functional connectivity of SNr and GPe neurons with cholinergic, glutamatergic, and GABAergic PPN neurons is solid and reveals a prominent influence of the SNr over the entire PPN output. In addition, the evidence of a GPe projection that preferentially innervates the caudal glutamatergic PPN is unexpected and highly relevant for basal ganglia function.

      Opposing effects of two basal ganglia outputs on locomotion and valence through their connectivity with the PPN.

      Overall, these results provide an unprecedented cell-type-specific characterization of the effects of basal ganglia inputs in the PPN and support the well-established notion of a close relationship between the PPN and the basal ganglia.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The behavioral experiments require further analysis as some motor effects could have been averaged out by analyzing long segments.

      We will further analyze our motor effects in the revised manuscript.

      Additional controls are needed to rule out a motor effect in the real-time place preference task.

      This is an important point. Our use of unilateral stimulation in the RTPP task reduces potential motor effects, and our supplemental videos show that the mice can easily escape and enter the stimulated zone. However, we can't completely rule out a motor component. To delve into this further, we analyzed mouse speed in the RTPP task. We find that in both SNr and GPe stimulation conditions, the maximum speed of the mouse is not different in the stimulated vs unstimulated zone. We will further analyze mouse speed at the transition into and out of the stimulated zone to identify any acute motor effects in this experiment.

      Importantly, the location of the stimulation is not reported even though this is critical to interpret the behavioral effects.

      The implant locations were generally over the middle-to-rostral PPN and we will clarify this in the revised manuscript. These locations are shown in figure 7B.

      There are some concerns about the possible recruitment of dopamine neurons in the SNr experiments.

      We are very interested in this possibility and plan to discuss this with more clarity in a revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      This is not a recommendation. While reading old literature, I found some interesting facts. The shape of the neurocranium in monotremes, birds, and mammals, at least in early stages, resembles the phenotype of 'dact'1/2, wnt11f2, or syu mutants. For more details, see DeBeer's: 'The Development of the Vertebrate Skull, !937' Plate 137. 

      Thank you for pointing this out. It is indeed interesting.

      Minor Comments: 

      • Lines 64, 66, and 69: same citation without interruption: Heisenberg, Brand et al. 1996

      Revised line 76. 

      • Lines 101 and 102: same citation without interruption: Li, Florez et al. 2013 

      Revised line 118.

      • Lines 144, 515, 527, and 1147: should be wnt11f2 instead of wntllf2 - if not, then explain 

      Revised lines 185, 625, 640,1300.

      • Lines 169 and 171: incorrect figure citation: Fig 1D - correct to Fig 1F 

      Revised lines 217, 219.

      • Line 173: delete (Fig. S1) 

      Revised line 221.

      • Line 207: indicate that both dact1 and dact2 mRNA levels increased, noting a 40% higher level of dact2 mRNA after deletion of 7 bp in the dact2 gene 

      Revised line 265.

      • Line 215: Fig 1F instead of Fig 1D 

      Revised line 217.

      • Line 248: unify naming of compound mutants to either dact1/2 or dact1/dact2 compound mutants 

      Revised to dact1/2 throughout.

      • Line 259: incorrect figure citation: Fig S1 - correct to Fig S2D/E 

      Revised line 324.

      • Line 302: correct abbreviation position: neural crest (NCC) cell - change to neural crest cell (NCC) population 

      Revised line 380.

      • Line 349: repeating kny mut definition from line 70 may be unnecessary 

      Revised line 434.

      • Line 351: clarify distinction between Fig S1 and Fig S2 in the supplementary section 

      Revised line 324.

      • Line 436: refer to the correct figure for pathways associated with proteolysis (Fig 7B) 

      Revised line 530.

      • Line 446-447: complete the sentence and clarify the relevance of smad1 expression, and correct the use of "also" in relation to capn8 

      Revised line 567.

      • Line 462: clarify that this phenotype was never observed in wildtype larvae, and correct figure reference to exclude dact1+/- dact2+/- 

      Revised line 563, 568.

      • Line 463: explain the injection procedure into embryos from dact1/2+/- interbreeding 

      Revised line 565.

      • Lines 488 and 491: same citation without interruption: Waxman, Hocking et al. 2004 

      Revised line 591.

      • Line 502: maintain consistency in referring to TGF-beta signaling throughout the article 

      Revised throughout.

      • Line 523: define CNCC; previously used only NCC 

      Revised to cranial NCC throughout.

      • Line 1105: reconsider citing another work in the figure legend 

      Revised line 1249.

      • Line 1143: consider using "mutant" instead of "mu" 

      Revised line 1295.

      • Fig 2A/B: indicate the number of animals used ("n") 

      N is noted on line 1274.

      • Fig 2C, D, E: ensure uniform terminology for control groups ("wt" vs. "wildtype") 

      Revised in figure.

      • Fig 7C: clarify analysis of dact1/2-/- mutant in lateral plate mesoderm vs. ectoderm 

      Revised line 1356.

      • Fig 8A: label the figure to indicate it shows capn8, not just in the legend 

      Revised.

      • Fig 8D: explain the black/white portions and simplify to highlight important data 

      Revised.

      • Fig S2: add the title "Figure S2" 

      Revised.

      • Consider omitting the sentence: "As with most studies, this work has contributed some new knowledge but generated more questions than answers." 

      Revised line 720.

      Reviewer #2 (Recommendations For The Authors): 

      Major comments: 

      (1) The authors have addressed many of the questions I had, including making the biological sample numbers more transparent. It might be more informative to use n = n/n, e.g. n = 3/3, rather than just n = 3. Alternatively, that information can be given in the figure legend or in the form of penetrance %. 

      The compound heterozygote breeding and phenotyping analyses were not carried out in such a way that we can comment on the precise % penetrance of the ANC phenotype, as we did not dissect every ANC and genotype every individual that resulted from the triple heterozygote in crossings. We collected phenotype/genotype data until we obtained at least three replicates.

      We did genotype every individual resulting from dact1/2 dHet crosses to correlate genotype to the phenotype of the embryonic convergent extension phenotype and narrowed ethmoid plate (Fig. 2A, Fig. 3) which demonstrated full penetrance.

      (2) The description of the expression of dact1/2 and wnt11f2 is not consistent with what the images are showing. In the revised figure 1 legend, the author says "dact2 and wnt11f2 transcripts are detected in the anterior neural plate" (line 1099)", but it's hard to see wnt11f2 expression in the anterior neural plate in 1B. The authors then again said " wnt11f2 is also expressed in these cells", referring to the anterior neural plate and polster (P), notochord (N), paraxial and presomitic mesoderm (PM) and tailbud (TB). However, other than the notochord expression, other expression is actually quite dissimilar between dact2 and wnt11f2 in 1C. The authors should describe their expression more accurately and take that into account when considering their function in the same pathway. 

      We have revised these sections to more carefully describe the expression patterns. We have added references to previous descriptions of wnt11 expression domains.

      (3) Similar to (2), while the Daniocell was useful in demonstrating that expression of dact1 and dact2 are more similar to expression of gpc4 and wnt11f2, the text description of the data is quite confusing. The authors stated "dact2 was more highly expressed in anterior structures including cephalic mesoderm and neural ectoderm while dact1 was more highly expressed in mesenchyme and muscle" (lines 174-176). However, the Daniocell seems to show more dact1 expression in the neural tissues than dact2, which would contradict the in situ data as well. I think the problem is in part due to the dataset contains cells from many different stages and it might be helpful to include a plot of the cells at different stages, as well as the cell types, both of which are available from the Daniocell website. 

      We have revised the text to focus the Daniocell analysis on the overall and general expression patterns. Line 220.

      (4) The authors used the term "morphological movements" (line 337) to describe the cause of dact1/2 phenotypes. Please clarify what this means. Is it cell movement? Or is it the shape of the tissues? What does "morphological movements" really mean and how does that affect the formation of the EP by the second stream of NCCs? 

      We have revised this sentence to improve clarity. Line 416.

      (5) In the first submission, only 1 out of 142 calpain-overexpressing animals phenocopied dact1/2 mutants and that was a major concern regarding the functional significance of calpain 8 in this context. In the revised manuscript, the authors demonstrated that more embryos developed the phenotype when they are heterozygous for both dact1/2. While this is encouraging, it is interesting that the same phenomenon was not observed in the dact1-/-; dact2+/- embryos (Fig. 6D). The authors did not discuss this and should provide some explanation. The authors should also discuss sufficiency vs requirement tested in this experiment. However, given that this is the most novel aspect of the paper, performing experiments to demonstrate requirements would be important. 

      We have added a statement regarding the non-effect in dact1-/-;dact2+/- embryos. Line 568-570. We have also added discussion of sufficiency vs necessity/requirement testing. Line 676-679.

      (6) Related to (5), the authors cited figure 8c when mentioning 0/192 gfp-injected embryos developed EP phenotypes. However, figure 8c is dact1/2 +/- embryos. The numbers also doesn't match the numbers in Figure 8d either. Please add relevant/correct figures. 

      The text has been revised to distinguish between our overexpression experiment in wildtype embryos (data not shown) versus overexpression in dact1/2 double het in cross embryos (Fig 8).

      Minor comments: 

      (1) Fig 1 legend line 1106 "the midbrain (MP)" should be MB 

      Revised line 1250.

      (2) Wntllf2, instead of wnt11f2, (i.e. the letter "l" rather than the number "1") was used in 4 instances, line 144, 515, 527, 1147 

      Revised lines 185, 625, 640,1300.

      (3) The authors replaced ANC with EP in many instances, but ANC is left unchanged in some places and it's not defined in the text. It's first mentioned in line 170.

      Revised line 218.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript gives a broad overview of how to write NeuroML, and a brief description of how to use it with different simulators and for different purposes - cells to networks, simulation, optimization, and analysis. From this perspective, it can be an extremely useful document to introduce new users to NeuroML.

      We are glad the reviewer found our manuscript useful.

      However, the manuscript itself seems to lose sight of this goal in many places, and instead, the description at times seems to target software developers. For example, there is a long paragraph on the board and user community. The discussion on simulator tools seems more for developers, not users. All the information presented at the level of a developer is likely to be distracting to eLife readership.

      To make the paper less developer focussed and more accessible to the end user we have shortened the long paragraphs on the board and user community (and moved some of this text to the Methods section; lines: 524-572 in the document with highlighted changes). We have also made the discussion on simulator tools more focussed on the user (lines 334-406). However, we believe some information on the development and oversight of NeuroML and its community base are relevant to the end user, so we have not removed these completely from the main text.

      Strengths:

      The modularity of NeuroML is indeed a great advantage. For example, the ability to specify the channel file allows different channels to be used with different morphologies without redundancy. The hierarchical nature of NeuroML also is commendable, and well illustrated in Figures 2a through c.

      The number of tools available to work with NeuroML is impressive.

      The abstract, beginning, and end of the manuscript present and discuss incorporating NeuroML into research workflows to support FAIR principles.

      Having a Python API and providing examples using this API is fantastic. Exporting to NeuroML from Python is also a great feature.

      We are glad the reviewer appreciated the design of NeuroML and its support for FAIR principles.

      Weaknesses:

      Though modularity is a strength, it is unclear to me why the cell morphology isn't also treated similarly, i.e., specify the morphology of a multi-compartmental model in a separate file, and then allow the cell file to specify not only the files containing channels, but also the file containing the multi-compartmental morphology, and then specify the conductance for different segment groups. Also, after pynml_write_neuroml2_file, you would not have a super long neuroML file for each variation of conductances, since there would be no need to rewrite the multi-compartmental morphology for each conductance variation.

      We thank the reviewer for highlighting this shortcoming in NeuroML2. We have now added the ability to reference externally defined (e.g. in another file) <morphology> and <biophysicalProperties> elements from <cells>. This has enabled the morphologies and/or specification of ionic conductances to be separated out and enables more streamlined analysis of cells with different properties, as requested. Simulators NEURON, NetPyNE and EDEN already support this new form. Information on this feature has been added to https://docs.neuroml.org/Userdocs/ImportingMorphologyFiles.html#neuroml2 and also mentioned in the text (lines 188-190).

      This would be especially important for optimizations, if each trial optimization wrote out the neuroML file, then including the full morphology of a realistic cell would take up excessive disk space, as opposed to just writing out the conductance densities. As long as cell morphology must be included in every cell file, then NeuroML is not sufficiently modular, and the authors should moderate their claim of modularity (line 419) and building blocks (551).

      We believe the new functionality outlined above addresses this issue, as a single file containing the <morphology> element could be referenced, while a much smaller file, containing the channel distributions in a <biophysicalProperties> element would be generated and saved on each iteration of the optimisation.

      In addition, this is very important for downloading NeuroML-compliant reconstructions from NeuroMorpho.org. If the cell morphology cannot be imported, then the user has to edit the file downloaded from NeuroMorpho.org, and provenance can be lost.

      While the NeuroMorpho.Org website does support converting reconstructed morphologies in SWC format to NeuroML, this export feature is no longer supported on most modern browsers due to it being based on Java Applet technologies. However, a desktop version of this application, CVApp, is actively maintained

      (https://github.com/NeuroML/Cvapp-NeuroMorpho.org), and we have updated it to support export of the SWC to the standalone <morphology> element form of NeuroML discussed above. Additionally, a new Python application for conversion of SWC to NeuroML is in development and will be incorporated into PyNeuroML (Google Summer of Code 2024). Our documentation has been updated with the recommended use of SWC in NeuroML based modelling here: https://docs.neuroml.org/Userdocs/Software/Tools/SWC.html

      We have also included URLs to the tool and the documentation in the paper (lines: 473-474).

      SWC files, however, cannot be used “as is” for modelling since they only include information (often incomplete—for example a single point may represent a soma in SWC files) on the points that make the cell, but not on the sections/segments/cables that these form. Therefore, NeuroML and other simulation tools, including NEURON, must convert these into formats suitable for simulation. The suggested pipeline for use of NeuroMorpho SWC files would therefore be to convert them to NeuroML, check that they represent the intended compartmentalisation of the neuron and then use them in models.

      To ensure that provenance is maintained in all NeuroML models (including conversions from other formats), NeuroML supports the addition of RDF annotations using the COMBINE annotation specifications in model files:

      https://docs.neuroml.org/Userdocs/Provenance.html. We have added this information to the paper (lines: 464-465).

      Also, Figure 2d loses the hierarchical nature by showing ion channels, synapses, and networks as separate main branches of NeuroML.

      While an instance of an ion channel is on a segment, in a cell, in a population (and hence there is a hierarchy between them), in terms of layout in a NeuroML file the ion channel is defined at the “top level” so that it can be referenced and used by multiple cells, the cell definitions are also defined top level, and used in multiple populations, etc. There are multiple ways to depict these relationships between entities, and we believe Fig 2d complements Fig 2a-c (which is more hierarchical), by emphasising the different categories of entities present in NeuroML files. We have modified the caption of Figure 2d to clarify that it shows the main categories of elements included in the NeuroML standard in their respective hierarchies.

      In Figure 5, the difference between the core and native simulator is unclear.

      We have modified the figure and text (lines: 341) to clarify this. We now say “reference” simulators instead of “core”. This emphasises that jNeuroML and pyLEMS are intended as reference implementations in each of their languages of how to interpret NeuroML models, as opposed to high performance simulators for research use. We have also updated the categorization of the backends in the text accordingly.

      What is involved in helper scripts?

      Simulators such as NetPyNE can import NeuroML into their own internal format, but require some boilerplate code to do this (e.g. the NetPyNE scripts calls the importNeuroML2SimulateAnalyze() method with appropriate parameters). The NeuroML tools generate short scripts that use this boilerplate code. We have renamed “helper scripts” to “import scripts'' for clarity (Figure 5 and its caption).

      I thought neurons could read NeuroML? If so, why do you need the export simulator-specific scripts?

      The NEURON simulator does have some NeuroML functionality (it can export cells, though not the full network, to NeuroML 2 through its ModelView menu), but does not natively support reading/importing of NeuroML in its current version. But this is not a problem as jNeuroML/PyNeuroML translates the NeuroML model description into NEURON’s formats: Python scripts/HOC/Nmodl which NEURON then executes.

      As NEURON is the simulator which allows simulation of the widest range of NeuroML elements, we have (in agreement with the NEURON developers) concentrated on incorporating the best support for NeuroML import/export in the latest (easy to install/update) releases of PyNeuroML, rather than adding this to the Neuron source code. NEURON’s core features have been very stable for years and many versions of the simulator are used by modellers - installing the latest PyNeuroML gives them the latest NEURON support without having to reinstall the latter.

      In addition, it seems strange to call something the "core" simulation engine, when it cannot support multi-compartmental models. It is unclear why "other simulators" that natively support NeuroML cannot be called the core.

      We agree that this terminology was confusing. As mentioned above, we have changed “core simulator” to “reference simulator”, to emphasise the roles of these simulation engine options.

      It might be more helpful to replace this sort of classification with a user-targeted description. The authors already state which simulators support NeuroML and which ones need code to be exported. In contrast, lines 369-370 mention that not all NeuroML models are supported by each simulator. I recommend expanding this to explain which features are supported in each simulator. Then, the unhelpful separation between core and native could be eliminated.

      As suggested, we have grouped the simulators in terms of function and removed the core/ non-core distinction. We have also added a table (Table 3) in the appendices that lists what features each simulation engine supports and updated the text to be more user focussed (lines: 348-394).

      The body of the manuscript has so much other detail that I lose sight of how NeuroML supports FAIR. It is also unclear who is the intended audience. When I get to lines 336-344, it seems that this description is too much detail for the eLife audience. The paragraph beginning on line 691 is a great example of being unclear about who is the audience. Does someone wanting to develop NeuroML models need to understand XSD schema? If so, the explanation is not clear. XSD schema is not defined and instead explains NeuroML-specific aspects of XSD. Lines 734-735 are another example of explaining to code developers (not model developers).

      We have modified these sentences to be more suitable for the general eLife audience: we have moved the explanation of how the different simulator backends are supported to the more technically detailed Methods section (lines 882-942).

      While the results sections focus on documenting what users can do with NeuroML, the Methods sections include information on “how” the NeuroML and software ecosystem function. While the information in the methods sections may not be required by users who want to use the standard NeuroML model elements, those users looking to extend NeuroML with their own model entities and/or contribute these for inclusion in the NeuroML standard will require some understanding of how the schema and component types work.

      We have tried to limit this information to the bare minimum, pointing to online documentation where appropriate. XSD schemas are, for example, briefly introduced at the beginning of the section “The NeuroML XML Schema”. We have also included a link to the W3C documentation on XSD schemas as a footnote (line 724).

      Reviewer #2 (Public Review):

      Summary:

      Developing neuronal models that are shareable, reproducible, and interoperable allows the neuroscience community to make better use of published models and to collaborate more effectively. In this manuscript, the authors present a consolidated overview of the NeuroML model description system along with its associated tools and workflows. They describe where different components of this ecosystem lay along the model development pathway and highlight resources, including documentation and tutorials, to help users employ this system.

      Strengths:

      The manuscript is well-organized and clearly written. It effectively uses the delineated model development life cycle steps, presented in Figure 1, to organize its descriptions of the different components and tools relating to NeuroML. It uses this framework to cover the breadth of the software ecosystem and categorize its various elements. The NeuroML format is clearly described, and the authors outline the different benefits of its particular construction. As primarily a means of describing models, NeuroML also depends on many other software components to be of high utility to computational neuroscientists; these include simulators (ones that both pre-date NeuroML and those developed afterwards), visualization tools, and model databases.

      Overall, the rationale for the approach NeuroML has taken is convincing and well-described. The pointers to existing documentation, guides, and the example usages presented within the manuscript are useful starting points for potential new users. This manuscript can also serve to inform potential users of features or aspects of the ecosystem that they may have been unaware of, which could lower obstacles to adoption. While much of what is presented is not new to this manuscript, it still serves as a useful resource for the community looking for information about an established, but perhaps daunting, set of computational tools.

      We are glad the reviewer appreciated the utility of the manuscript.

      Weaknesses:

      The manuscript in large part catalogs the different tools and functionalities that have been produced through the long development cycle of NeuroML. As discussed above, this is quite useful, but it can still be somewhat overwhelming for a potential new user of these tools. There are new user guides (e.g., Table 1) and example code (e.g. Box 1), but it is not clear if those resources employ elements of the ecosystem chosen primarily for their didactic advantages, rather than general-purpose utility. I feel like the manuscript would be strengthened by the addition of clearer recommendations for users (or a range of recommendations for users in different scenarios).

      To make Table 1 more accessible to users and provide recommendations we have added the following new categories: Introductory guides aimed at teaching the fundamental

      NeuroML concepts; Advanced guides illustrating specific modelling workflows; and Walkthrough guides discussing the steps required for converting models to NeuroML. Box 1 has also been improved to clearly mark API and command line examples.

      For example, is the intention that most users should primarily use the core NeuroML tools and expand into the wider ecosystem only under particular circumstances? What are the criteria to keep in mind when making that decision to use alternative tools (scale/complexity of model, prior familiarity with other tools, etc.)? The place where it seems most ambiguous is in the choice of simulator (in part because there seem to be the most options there) - are there particular scenarios where the authors may recommend using simulators other than the core jNeuroML software?

      The interoperability of NeuroML is a major strength, but it does increase the complexity of choices facing users entering into the ecosystem. Some clearer guidance in this manuscript could enable computational neuroscientists with particular goals in mind to make better strategic decisions about which tools to employ at the outset of their work.

      As mentioned in the response to Reviewer 1, the term “core simulator” for jNeuroML was confusing, as it suggested that this is a recommended simulation tool. We have changed the description of jNeuroML to a “reference simulator” to clarify this (Figure 5 and lines 341, 353).

      In terms of giving specific guidance on which simulator to use, we have focussed on their functionality and limitations rather than recommending a specific tool (as simulator independent standards developers we are not in a position to favour particular simulators). While NEURON is the most widely used simulator currently, other simulation opinions (e.g. EDEN) have emerged recently which provide quite comprehensive NeuroML support and similar performance. Our approach is to document and promote all supported tools, while encouraging innovation and new developments. The new Table 3 in the Appendix gives a guide to assist users in choosing which simulator may best suit their needs and we have updated the text to include a brief description (lines 348-394).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not understand what the $comments mean in Box 1. It isn't until I get further in the text that I realize that those are command line equivalents to the Python commands.

      We thank the reviewer for highlighting this confusion. We’ve now explicitly marked the API usage and command line usage example columns to make this clearer. We have also used “>” instead of “$” now to indicate the command line,

      In Figure 9 Caption "Examples of analysis functions ..", the word analysis seems a misnomer, as these graphs all illustrate the simulation output and graphing of existing variables. I think analysis typically refers to the transformation of variables, such as spike counts and widths.

      To clarify this we have changed the caption to “Examples of visualizing biophysical properties of a NeuroML model neuron”.

      Figure 10: Why is the pulse generator part of a model? Isn't that the input to a model?

      Whether the input to the model is described separately from the NeuroML biophysical description or combined with it is a choice for the researcher. This is possible because in NeuroML any entity which has time varying states can be a NeuroML element, including the current pulse generator. In this simple example the input is contained within the same file (and therefore <neuroml> element) as the cell. However, this does not need to be the case. The cell could be fully specified in its own NeuroML file and then this can be included in other files which add different inputs to facilitate different simulation scenarios. The Python scripting interface facilitates these types of workflows.

      In the interest of modularity, can stim information be stored in a separate file and "included"?

      Yes, as mentioned above, the stimulus could be stored in a separate file.

      I find it strange to use a cell with mostly dimensionless numbers as an example. I think it would be more helpful to use a model that was more physiological.

      In choosing an example model type to use to illustrate the use of LEMS (Fig 12), NeuroML (Fig 10), XML Schema (Fig 11), the Python API (Fig 13) and online documentation (Fig 15), we needed an example which showed a sufficiently broad range of concepts (dimensional parameters, state variables, time derivatives), but which is sufficiently compact to allow a concise depiction of the key elements in figures, that fit in a single page (e.g. Fig 12). We felt that the Hindmarsh Rose model, while not very physiological, was well suited for this purpose (explaining the underlying technologies behind the NeuroML specification). The simplicity of the Hindmarsh Rose model is counterbalanced in the manuscript by the detailed models of neurons and circuits in Figures 7 & 9. The latter shows a morphologically and biophysically detailed cortical L5b pyramidal cell model.

      In lines 710-714, it is unclear what is being validated. That all parameters are defined? Using the units (or lack thereof) defined in the schema?

      Validation against the schema is “level 1” validation where the model structure, parameters, parameter values and their units, cardinality, and element positioning in the model hierarchy are checked. We have updated the paragraph to include this information and to also point to Figure 6 where different levels of validation are explained.

      Lines 740 to 746 are confusing. If 1-1 between XSD and LEMS (1st sentence) then how can component types be defined in LEMS and NOT added to the standard? Which is it? 1-1 or not 1-1?

      For the curated model elements included in the NeuroML standard, there will be a 1-1 correspondence between their component type definitions in LEMS and type definitions in the XSD schema. New user defined component types (e.g. a new abstract cell model) can be specified in LEMS as required, and these do not need to be included in the XSD schema to be loaded/simulated. However, since they are not present in the schema definition of the core/curated elements, they cannot be validated against it (level 1 validation). We have modified the text to make this clearer (line: 778).

      Nonetheless, if the new type is useful for the wider community, it can be accepted by the Editorial Board, and at that stage it will be incorporated into the core types, and added to the Schema, to be part of “valid NeuroML”.

      Figure 12. select="synapses[*]/i" is not explained. Does /i mean that iSyn is divided by i, which is current (according to the sentence 3 lines after 766) or perhaps synapse number?

      We thank the reviewer for highlighting this confusion. We have now explained the construct in the text (lines 810-812). It denotes “select the i (current) values from all Attachments which have the id ‘synapses’”. These multiple values should be reduced down to a single value through addition, as specified by the attribute: reduce=”add”.

      The line after 766 says that "DerivedVariables, variables whose values depend on other variables". You should add "and that are not derivatives, which are handled separately" because by your definition derivatives are derived variables.

      Thank you. We have updated the text with your suggestion

      Reviewer #2 (Recommendations For The Authors):

      - Figure 9: I found it somewhat confusing to have the header from the screenshot at the top ("Layer 5 Burst Accommodating Double Bouquet Cell (5)") not match the morphology shown at the bottom. It's not visually clear that the different panels in Figure 9 may refer to unrelated cells/models.

      Thank you for pointing this out. We have replaced the NeuroML-DB screenshot with one of the same Layer 5b pyramidal cells shown in the panels below it.

      Additional change:

      Figure 7c (showing the NetPyNE-UI interface) has been replaced. Previously, this displayed a 3D model which had been created in NetPyNE itself, but now shows a model which has been created in NeuroML and imported for display/simulation in NetPyNE-UI, and therefore better illustrates NeuroML functionality.

    1. Author response:

      To Reviewer #1:

      Thank you for your kind words regarding the novelty, study design, and evidence presented. We will clarify our language when describing fuzzy local-linear regression discontinuity analysis. We thank you for this feedback as our goals are to introduce these methods to a neuroscientific audience. Lastly, we will respond and clarify the methodological points, including post-selection inference, bandwidths, and Bayesian analysis in version 2.

      To Reviewers #2 and #3:

      We thank you both for your constructive feedback, specifically in highlighting 1) the scope of the intervention and 2) the UKB-neuro healthy volunteer bias. In the next manuscript version, we will expand our discussion of plausible reasons for not finding an effect – weighing up the strengths and limitations of our study in 3 aspects; statistical (RD power), design-based (lack of representativeness vs. large sample), and mechanistic (the impact/or lack thereof of one-year of education on neural plasticity decades later). As we believe the approach of natural experiments with RD designs has considerable promise for the field of population cognitive neuroscience beyond this particular study, we will address each of these points within a broader section focused on considerations on how to optimize the insight, power, and inferences gained in future work within and beyond Biobank. Moreover, we will situate our discussion on the magnitude of the educational intervention among a broader discussion of cognitive training versus education, and short - versus long-term effects. We believe revising the manuscript will improve interpretation for the reader and thank you for your in-depth feedback. Lastly, we will provide a point-by-point response in the next version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The conserved AAA-ATPase PCH-2 has been shown in several organisms including C. elegans to remodel classes of HORMAD proteins that act in meiotic pairing and recombination. In some organisms the impact of PCH-2 mutations is subtle but becomes more apparent when other aspects of recombination are perturbed. Patel et al. performed a set of elegant experiments in C. elegans aimed at identifying conserved functions of PCH-2. Their work provides such an opportunity because in C. elegans meiotically expressed HORMADs localize to meiotic chromosomes independently of PCH-2. Work in C. elegans also allows the authors to focus on nuclear PCH-2 functions as opposed to cytoplasmic functions also seen for PCH-2 in other organisms. 

      The authors performed the following experiments: 

      (1) They constructed C. elegans animals with SNPs that enabled them to measure crossing over in intervals that cover most of four of the six chromosomes. They then showed that doublecrossovers, which were common on most of the four chromosomes in wild-type, were absent in pch-2. They also noted shifts in crossover distribution in the four chromosomes. 

      (2) Based on the crossover analysis and previous studies they hypothesized that PCH-2 plays a role at an early stage in meiotic prophase to regulate how SPO-11 induced double-strand breaks are utilized to form crossovers. They tested their hypothesis by performing ionizing irradiation and depleting SPO-11 at different stages in meiotic prophase in wild-type and pch-2 mutant animals. The authors observed that irradiation of meiotic nuclei in zygotene resulted in pch-2 nuclei having a larger number of nuclei with 6 or greater crossovers (as measured by COSA-1 foci) compared to wildtype. Consistent with this observation, SPO11 depletion, starting roughly in zygotene, also resulted in pch-2 nuclei having an increase in 6 or more COSA-1 foci compared to wild type. The increased number at this time point appeared beneficial because a significant decrease in univalents was observed. 

      (3) They then asked if the above phenotypes correlated with the localization of MSH-5, a factor that stabilizes crossover-specific DNA recombination intermediates. They observed that pch-2

      mutants displayed an increase in MSH-5 foci at early times in meiotic prophase and an unexpectedly higher number at later times. They conclude based on the differences in early MSH-5 localization and the SPO-11 and irradiation studies that PCH-2 prevents early DSBs from becoming crossovers and early loading of MSH-5. By analyzing different HORMAD proteins that are defective in forming the closed conformation acted upon by PCH-2, they present evidence that MSH-5 loading was regulated by the HIM-3 HORMAD. 

      (4) They performed a crossover homeostasis experiment in which DSB levels were reduced. The goal of this experiment was to test if PCH-2 acts in crossover assurance. Interestingly, in this background PCH-2 negative nuclei displayed higher levels of COSA-1 foci compared to PCH-2 positive nuclei. This observation and a further test of the model suggested that "PCH-2's presence on the SC prevents crossover designation." 

      (5) Based on their observations indicating that early DSBS are prevented from becoming crossovers by PCH-2, the authors hypothesized that the DNA damage kinase CHK-2 and PCH2 act to control how DSBs enter the crossover pathway. This hypothesis was developed based on their finding that PCH-2 prevents early DSBs from becoming crossovers and previous work showing that CHK-2 activity is modulated during meiotic recombination progression. They tested their hypothesis using a mutant synaptonemal complex component that maintains high CHK-2 activity that cannot be turned off to enable crossover designation. Their finding that the pch-2 mutation suppressed the crossover defect (as measured by COSA-1 foci) supports their hypothesis. 

      Based on these studies the authors provide convincing evidence that PCH-2 prevents early DSBs from becoming crossovers and controls the number and distribution of crossovers to promote a regulated mechanism that ensures the formation of obligate crossovers and crossover homeostasis. As the authors note, such a mechanism is consistent with earlier studies suggesting that early DSBs could serve as "scouts" to facilitate homolog pairing or to coordinate the DNA damage response with repair events that lead to crossing over. The detailed mechanistic insights provided in this work will certainly be used to better understand functions for PCH-2 in meiosis in other organisms. My comments below are aimed at improving the clarity of the manuscript. 

      We thank the reviewer for their concise summary of our manuscript and their assessment of our work as “convincing” and providing “detailed mechanistic insight.”

      Comments 

      (1) It appears from reading the Materials and Methods that the SNPs used to measure crossing over were obtained by mating Hawaiian and Bristol strains. It is not clear to this reviewer how the SNPs were introduced into the animals. Was crossing over measured in a single animal line? Were the wild-type and pch-2 mutations made in backgrounds that were isogenic with respect to each other? This is a concern because it is not clear, at least to this reviewer, how much of an impact crossing different ecotypes will have on the frequency and distribution of recombination events (and possibly the recombination intermediates that were studied). 

      We will clarify these issues in the Materials and Methods of an updated preprint. The control and pch-2 mutants were isogenic in either the Bristol or Hawaiian backgrounds. Control lines were the original Bristol and Hawaiian lines and pch-2 mutants were originally made in the Bristol line and backcrossed at least 3 times before analysis. Hawaiian pch-2 mutants were made by backcrossing pch-2 mutants at least 7 times to the Hawaiian background and verifying the presence of Hawaiian SNPs on all chromosomes tested in the recombination assay. To perform the recombination assays, these isogenic lines were crossed to generate the relevant F1s.

      (2) The authors state that in pch-2 mutants there was a striking shift of crossovers (line 135) to the PC end for all of the four chromosomes that were tested. I looked at Figure 1 for some time and felt that the results were more ambiguous. Map distances seemed similar at the PC end for wildtype and pch-2 on Chrom. I. While the decrease in crossing over in pch-2 appeared significant for Chrom. I and III, the results for Chrom. IV, and Chrom. X. seemed less clear. Were map distances compared statistically? At least for this reviewer the effects on specific intervals appear less clear and without a bit more detail on how the animals were constructed it's hard for me to follow these conclusions. 

      We hope that the added details above makes the results of these assays more clear. Map distances were compared and did not satisfy statistical significance, except where indicated. While we agree that the comparisons between control animals and pch-2 mutants may seem less clear with individual chromosomes, we argue that more general patterns become clear when analyzing multiple chromosomes. Indeed, this is why we expanded our recombination analysis beyond Chromosome III and the X Chromosomes, as reported in Deshong, 2014. 

      (3) Figure 2. I'm curious why non-irradiated controls were not tested side-by-side for COSA-1 staining. It just seems like a nice control that would strengthen the authors' arguments. 

      We will add these controls in the updated preprint.

      (4) Figure 3. It took me a while to follow the connection between the COSA-1 staining and DAPI staining panels (12 hrs later). Perhaps an arrow that connects each set of time points between the panels or just a single title on the X-axis that links the two would make things clearer. 

      We will make changes in the updated preprint to make this figure more clear.

      Reviewer #2 (Public review): 

      Summary: 

      This paper has some intriguing data regarding the different potential roles of Pch-2 in ensuring crossing over. In particular, the alterations in crossover distribution and Msh-5 foci are compelling. My main issue is that some of the models are confusingly presented and would benefit from some reframing. The role of Pch-2 across organisms has been difficult to determine, the ability to separate pairing and synapsis roles in worms provides a great advantage for this paper. 

      Strengths: 

      Beautiful genetic data, clearly made figures. Great system for studying the role of Pch-2 in crossing over. 

      We thank the reviewers for their constructive and useful summary of our manuscript and the analysis of its strengths. 

      Weaknesses: 

      (1) For a general audience, definitions of crossover assurance, crossover eligible intermediates, and crossover designation would be helpful. This applies to both the proposed molecular model and the cytological manifestation that is being scored specifically in C. elegans. 

      We will make these changes in an updated preprint.

      (2) Line 62: Is there evidence that DSBs are introduced gradually throughout the early prophase? Please provide references. 

      We will reference Woglar and Villeneuve 2018 and Joshi et. al. 2015 to support this statement in the updated preprint.

      (3) Do double crossovers show strong interference in worms? Given that the PC is at the ends of chromosomes don't you expect double crossovers to be near the chromosome ends and thus the PC? 

      Despite their rarity, double crossovers do show interference in worms. However, the PC is limited to one end of the chromosome. Therefore, even if interference ensures the spacing of these double crossovers, the preponderance of one of these crossovers toward one end (and not both ends) suggest something functionally unique about the PC end.

      (4) Line 155 - if the previous data in Deshong et al is helpful it would be useful to briefly describe it and how the experimental caveats led to misinterpretation (or state that further investigation suggests a different model etc.). Many readers are unlikely to look up the paper to find out what this means. 

      We will add this to the updated preprint.

      (5) Line 248: I am confused by the meaning of crossover assurance here - you see no difference in the average number of COSA-1 foci in Pch-2 vs. wt at any time point. Is it the increase in cells with >6 COSA-1 foci that shows a loss of crossover assurance? That is the only thing that shows a significant difference (at the one time point) in COSA-1 foci. The number of dapi bodies shows the loss of Pch-2 increases crossover assurance (fewer cells with unattached homologs). So this part is confusing to me. How does reliably detecting foci vs. DAPI bodies explain this? 

      We apologize for the confusion and will make this more clear in an updated perprint. The reviewer is correct that we do not see a difference in the average number of GFP::COSA1 foci at all time points in this experiment, even though we do see a difference in the number of DAPI stained bodies (an increase in crossover assurance in pch-2 mutants). What we meant to convey is that because of PCH-2’s dual role in regulating crossover formation (inhibiting it in early prophase, guaranteeing assurance later), the average number of GFP::COSA-1 foci at all time points also reflects this later role, resulting in this average being lower than if PCH-2 only inhibited crossovers early in meiotic prophase. We have shown that this later role does not significantly affect the average number of DAPI stained bodies, allowing us to see the role of PCH-2 in early meiotic prophase on crossover formation more clearly.

      (6) Line 384: I am confused. I understand that in the dsb-2/pch2 mutant there are fewer COSA-1 foci. So fewer crossovers are designated when DSBs are reduced in the absence of PCH-2.

      How then does this suggest that PCH-2's presence on the SC prevents crossover designation? Its absence is preventing crossover designation at least in the dsb-2 mutant. 

      We will also make this more clear in an updated preprint, as well as provide additional evidence to support this claim. In this experiment, we had identified three possible explanations for why PCH-2 persists on some nuclei that do not have GFP::COSA-1 foci: 1) PCH-2 removal is coincident with crossover designation; 2) PCH-2 removal depends on crossover designation; and 3) PCH-2 removal facilitates crossover designation. The decrease in the number of GFP::COSA-1 foci in dsb-2::AID;pch-2 mutants argues against the first two possibilities, suggesting that the third might be correct. We have additional evidence that we will include in an updated preprint that should provide stronger support and make this more clear.

      (7) Discussion Line 535: How do you know that the crossovers that form near the PCs are Class II and not the other way around? Perhaps early forming Class I crossovers give time for a second Class II crossover to form. In budding yeast, it is thought that synapsis initiation sites are likely sites of crossover designation and class I crossing over. Also, the precursors that form class I and II crossovers may be the same or highly similar to each other, such that Pch-2's actions could equally affect both pathways. 

      We do not know that the crossovers that form near the PC are Class II but hypothesize that they are based on the close, functional relationship that exists between Class I crossovers and synapsis and the apparent antagonistic relationship that exists between Class II crossovers and synapsis. We agree that Class I and Class II crossover precursors are likely to be the same or highly similar, exhibit extensive crosstalk that may complicate straightforward analysis and PCH-2 is likely to affect both, as strongly suggested by our GFP::MSH-5 analysis. We present this hypothesis based on the apparent relationship between PCH-2 and synapsis in several systems but agree that it needs to be formally tested. We will make this argument more clear in an updated preprint.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes an in-depth analysis of the effect of the AAA+ ATPase PCH-2 on meiotic crossover formation in C. elegant. The authors reach several conclusions, and attempt to synthesize a 'universal' framework for the role of this factor in eukaryotic meiosis. 

      Strengths: 

      The manuscript makes use of the advantages of the 'conveyor' belt system within the c.elegans reproductive tract, to enable a series of elegant genetic experiments. 

      We thank this reviewer for the useful assessment of our manuscript and the articulation of its strengths.

      Weaknesses: 

      A weakness of this manuscript is that it heavily relies on certain genetic/cell biological assays that can report on distinct crossover outcomes, without clear and directed control over other aspects and variables that might also impact the final repair outcome. Such assays are currently out of reach in this model system. 

      In general, this manuscript could be more generally accessible to non-C.elegans readers. Currently, the manuscript is hard to digest for non-experts (even if meiosis researchers). In addition, the authors should be careful to consider alternative explanations for certain results. At several steps in the manuscript, results could ostensibly be caused by underlying defects that are currently unknown (for example, can we know for sure that pch-2 mutants do not suffer from altered DSB patterning, and how can we know what the exact functional and genetic interactions between pch-2 and HORMAD mutants tell us?). Alternative explanations are possible and it would serve the reader well to explicitly name and explain these options throughout the manuscript. 

      We will make the manuscript more accessible to non-C. elegans readers and discuss alternate explanations for specific results in an updated preprint.

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      A summary of changes

      (1) Line 93: “positive effect” to “positive contribution”, as suggested by reviewer 2.

      (2) Line 147-148: the null hypothesis to test “equal interspecific and intraspecific interactions”, as indicated by reviewers 2 and 4.

      (3) Lines 155-162: removed to reduce duplication with the additive partitioning, as suggested by reviewer 2.

      (4) Lines 186-188: added “the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates”, as suggested by reviewer 3.  

      (5) Lines 219-222: added “The community positive effect can be further partitioned by mechanisms of positive interactions (resource partitioning and facilitation), and facilitative effect can be classified as mutualism (+/+), commensalism (+/0), or parasitic (+/–) based on species specific assessments”.  

      (6) Lines 377-386: added options for determining maximum competitive growth response in some extreme scenarios of species mixtures.

      (7) Figure 1: modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).    

      A summary of four reviewers’ questions and authors’ response

      (1) A summary of authors’ responses. Reviewers did not seem to understand our work. They indicated that our model is inadequate for hypothesis testing. The fact is, as we note below, that our model allows for more hypothesis testing than the additive partitioning model. They suggested that one of our model components, the competitive growth response, needs to be further partitioned. However, this term represents only the competition effect and can not be split any further. Reviewers criticized us for misunderstanding the additive components while they suggested the same logic to test some intuitive ideas. They did not seem to know that the effects of competitive interactions vary with assessment methods, which differ between competition and biodiversity research. Our work seeks to harmonise definitions between these two fields and bridge the gap. The reviewers acknowledged that the additive components (i.e., the selection effect and complementarity effect) do not have clear biological meanings; however, they did not acknowledge that the additive components are used extensively for determining mechanisms of species interactions in biodiversity research. There is hardly any research that uses the additive partitioning model without linking the additive components to specific mechanisms of species interactions (i.e., positive SE to competition and positive CE to positive interactions).

      (2) Additive partitioning and underlying mechanisms. Some reviewers acknowledged that additive partitioning is not meant for determining mechanisms of species interactions and therefore argued that the additive partitioning should not be criticized for lack of biological meanings with the additive components. However, they insisted that additive partitioning is useful in quantifying net biodiversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions or testing the idea that “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. Are these views contradictory each other? How can the additive partitioning that is not designed for determining mechanisms of species interactions provide meaningful explanations for outputs of species interactions, e.g., “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”?

      Reviewers did not seem to realize that these ideas are equivalent to the suggestions that CE represents for the effects of positive interactions and SE for the effects of competitive interactions, that the quantification of net biodiversity effects does not require the two additive components, and that the null hypothesis exists long before the additive partitioning (see de Wit, 1960, de Wit et al., 1966). It is generally agreed that CE and SE result from mathematical calculations and do not have clear biological meanings in terms of linkages to specific mechanisms of species interactions responsible for observed net biodiversity effects or changes in ecosystem function (Loreau and Hector, 2012; Bourrat et al., 2023). Calling some mixed effects of species interactions as mechanisms (e.g., CE and SE) is misleading.        

      Model structure: incomplete or inadequate for hypothesis testing. Other than positive, negative, and competition interactions, two reviewers wanted to have more specific interactions such as microclimate amelioration and negative feedback from species-specific pests and pathogens. The determination of these specific mechanisms requires more investigations and cannot be simply made through partitioning growth and yield data. However, the effects of these interactions will be captured in our definition of species interactions.  Reviewers did not seem to know that the additive partitioning would also not allow identifying these specific positive species interactions.

      Inspired by the mathematical form of additive partitioning, two reviewers suggested that our model (presumably equation 4) is incomplete and the second term, i.e., competitive growth response needs to be further explored or partitioned. The second term represents deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. We do not know why and how this term can be further partitioned and what any subcomponents would mean.   

      Our competitive partitioning model is based on two hypotheses: first, the null hypothesis to test the equivalence of interspecific and intraspecific interactions. This hypothesis is the same as the additive partitioning model. Second, the competitive hypothesis, which tests the dominance of positive or negative species interactions in a community. Thus, our model allows for more hypothesis testing than the current additive partitioning model.     

      (3) Types of species interactions. We follow the definition of species interactions generally used in biodiversity research (see Loreau and Hector, 2001), i.e., positive interactions (or complementarity) include resource partitioning and facilitation, negative interactions include interference competition, and competitive interactions include resource competition. One reviewer suggested that resource partitioning is byproduct of competition and should not be part of positive species interactions, which may be true for long-term evolution of species co-existence but not for biodiversity experiments of decade duration at most. Two reviewers suggested that positive interactions should also include microclimate amelioration or negative feedback from species-specific pests and pathogens. We agree and these are included in our definition. 

      (4) Significance of partial density monocultures. We used partial and full density monocultures and species competitive ability to determine what species can possibly achieve in mixture under the competitive hypothesis that constituent species share an identical niche but differ in growth and competitive ability. We did not use partial monocultures to test the effects of density on biodiversity effects. As with the additive partitioning, the competitive partitioning model is not designed for comparing yields across different densities. We added at lines 186-188 to indicate that the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates.  

      Similarly, we do not use the partial density monoculture to  supplant the replacement series design. Partial density monocultures only supplement the “replacement series” design that does not provides estimates of facilitative effects and competitive growth responses that would occur in mixtures. It is crucial to know that one experimental approach is simply not enough for determining underlying mechanisms of species interactions responsible for changes in ecosystem function.  

      (5) Competition effect in competition and biodiversity research. Due to different methods used, competition effect in competition research has different ecological meanings from that in biodiversity research. In competition research, species performance in mixture are compared with their partial density monocultures and therefore competition effect is generally negative, as suggested by reviewer 4. In biodiversity research, comparison is between mixture and full density monocultures. The resulting competition effect can be positive or negative for both individual species and community productivity defined by species composition and full density monoculture yields.     

      Therefore, we cannot use the results of competition research based on additive series design to describe effects of competitive interactions on ecosystem productivity based replacement series design.

      Reviewer #1 (Public Review):

      [Editors' note: this is an overall synthesis from the Reviewing Editor in consultation with the reviewers.]

      The three reviews expand our critique of this manuscript in some depth and complementary directions. These can be synthesized in the following main points (we point out that there is quite a bit more that could be written about the flaws with this study; however, time constraints prevented us from further elaborating on the issues we see):

      (1) It is unclear what the authors want to do.

      As indicate by the title, our objective is to “partition changes in ecosystem productivity by effects of species interactions”, i.e., partitioning net biodiversity effects estimated from the null expectation into components associated with positive, negative, or competition interspecific interactions.

      It seems their main point is that the large BEF literature and especially biodiversity experiments overstate the occurrence of positive biodiversity effects because some of these can result from competition.

      We demonstrated through ecological theories and simulation/experiment data that competition is a major source of the net biodiversity effects estimated with additive partitioning model. We know that competition effect varies with mixture attributes. Future research will determine average effect of competitive interactions on biodiversity effects in large BEF literature.   

      Because reduced interspecific relative to intraspecific competition in mixture is sufficient to produce positive effects in mixtures (if interspecific competition = 0 then RYT = S, where S is species richness in mixture -- this according to the reciprocal yield law = law of constant final yield), they have a problem accepting NE > 0 as true biodiversity effect (see additive partitioning method of Loreau & Hector 2001 cited in manuscript).

      We have no problem to accept NE>0 as true positive biodiversity effect. However, NE>0 can also result from competitive interactions based on the null expectation and needs to be partitioned by effects of species interactions.

      (2) The authors' next claim, without justification, that additive partitioning of NE is flawed and theoretically and biologically meaningless.

      The additive partitioning model is based on Covariance equation (or Price equation) that has nothing to do with biodiversity partitioning (Bourrat et al., 2023). Biological meaning was arbitrarily assigned to CE and SE. We made clear that the additive partitioning model is mathematically sound but does not have biological meanings that it has been used for.   

      They misinterpret the CE component as biological niche partitioning and the SE component as biological dominance.

      We did not. Loreau and Hector (2001) clearly indicated positive CE for positive interactions and positive SE for competitive interactions, which is generally what has been used for in the last twenty years.

      They do not seem to accept that the additive partitioning is a logically and mathematically sound derivation from basic principles that cannot be contested.

      We do not have problem with mathematical form of additive partitioning but only oppose ecological meanings assigned to CE and SE, simply because CE and SE both result from all species interactions (see Loreau and Hector, 2001; Bourrat et al., 2023). The reviewer seemed to have a contradictory thinking that the additive components are biologically meaningless but derived from biological basic principles.       

      (3) The authors go on to introduce a method to calculate species-level overyielding (RY > 1/S in replacement series experiments) as a competitive growth response and multiply this with the species monoculture biomass relative to the maximum to obtain competitive expectation. This method is based on resource competition and the idea that resource uptake is fully converted into biomass (instead of e.g. investing it in allelopathic chemical production).

      Correct, but we did not assume “resource uptake is fully converted into biomass”.

      (4) It is unclear which experiments should be done, i.e. are partial-density monocultures planted or simply calculated from full-density monocultures? At what time are monocultures evaluated? The framework suggests that monocultures must have the full potential to develop, but in experiments, they are often performing very poorly, at least after some time. I assume in such cases the monocultures could not be used.

      Both partial and full density monocultures are needed, along with mixtures to separate NE by species interactions. Calculating competitive growth responses from density-size relationships can be an alternative, given the lack of partial density monocultures in current biodiversity experiments, but is not preferred.

      Similar to additive partitioning, our model can (and should) be applied to all developmental stages of an experiment to examine how interactions evolve through time.   

      (5) There are many reasons why the ideal case of only resource competition playing a role is unrealistic. This excludes enemies but also differential conversion factors of resources into biomass and antagonistic or facilitative effects. Because there are so many potential reasons for deviations from the null model of only resource competition, a deviation from the null model does not allow conclusions about underlying mechanisms.

      The competitive expectation is only a hypothesis, just as the null expectation. The difference between competitive and null expectations represents a competitive effect resulting from species differences in growth and competitive ability, while the deviation of observed yields from the competitive expectation indicates positive or negative effect (see lines 201-219).

      Furthermore, this is not a systematically developed partitioning, but some rather empirical ad hoc formulation of a first term that is thought to approximate competitive effects as understood by the authors (but again, there already are problems here). The second residual term is not investigated. For a proper partitioning approach, one would have to decompose overyielding into two (or more) terms and demonstrate (algebraically) that under some reasonable definitions of competitive and non-competitive interactions, these end up driving the respective terms.

      The first term represents the null expectation assuming equal interspecific and intraspecific interactions, i.e., absence of positive, negative, and competition effects. The second residual term represents competition effect, due to species differences in growth and competitive ability. The meaning of second residual term is clear and does not need to be further partitioned or investigated.

      In fact, our competitive partitioning also has several components including null expectation, competitive growth response, and observed yield, plus partial density monocultures for species assessment, or null expectations, competitive expectations, and observed yields for community level assessment, although different from the additive partitioning.

      (6) Using a simplistic simulation to test the method is insufficient. For example, I do not see how the simulation includes a mechanism that could create CE in additive partitioning if all species would have the same monoculture yield. Similarly, they do not include mechanisms of enemies or antagonistic interactions (e.g. allelopathy).

      The simulation model we used is developed from real world data and can only do what are available in the model in terms of species and their growth under different conditions. We can not go beyond data limitation. The model is empirical and has been shown to accurately estimate yield in the aspen-spruce forest condition. We would also note that we do also use experimental data (Table 2).  

      (7) The authors do not cite relevant literature regarding density x biodiversity experiments, competition experiments, replacement-series experiments, density-yield experiments, additive partitioning, facilitation, and so on.

      We cited literature relevant to biodiversity partitioning since we are not aiming to cover everything. The reviewer may not be aware that most of the research areas listed are actually included in our work, such as additive and replacement-series experiment designs, additive partitioning, facilitation, competition studies, and density-yield relationships. Our competitive model partitioning is based on biological principles, while the additive partitioning model is based only on a mathematical equation.   

      Overall, this manuscript does not lead further from what we have already elaborated in the broad field of BEF and competition studies and rather blurs our understanding of the topic.

      The results of competition studies based on additive series design are not really used in the broad field of BEF based on replacement series design. The effects of competitive interactions on BEF are never clearly defined using the results of competition studies. Our work is filling that gap.  

      Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The reviewer needs to know that these ideas are based on the same logic that positive CE represents the effects of positive interactions and positive SE represents the effects of competitive interactions. CE>0 or SE>0 can result from many different scenarios of species interactions, not necessarily “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. CE>0 and SE>0 can occur alone or together. We simply can not tell underlying mechanisms of overyielding from mathematical calculations (CE and SE), as suggested by this reviewer later.

      The reviewer criticizes us while using the same logic themselves.

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The reviewer actually supports our point. However, CE and SE have been largely used as biological mechanisms, positive CE as the results of complementary interactions and positive SE as the results of competitive interactions (see Loreau and Hector, 2001).  

      We do not have problem with the "statistical structure" of AP; it is simply a covariance equation. It is important to know that CE and SE do not provide additional information on overyielding than NE in terms of underlying mechanisms of species interactions. Any attempt to investigate mechanism of overyielding with CE or SE can easily go wrong.

      Our competitive partitioning model incorporates effects of competitive interactions into the conventional null expectation and allows for separating different effects of species interactions. In comparison, the additive partitioning model does not have this capacity, not even designed for this purpose, as suggested by this and other reviewers.         

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      Correct.

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction; we only want to separate the effect of competition from those of other species interactions.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Correct.

      We added at lines 377-386 to discuss options to determine MG in some uncommon scenarios of species mixtures.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      First, the "competitive effect" focusses on resource competition and other forms of competition (presumably interference competition) are included in the negative interactions.

      Second, competitive growth response varies over time and with density, and so do NE, CE, SE, and interspecific interactions.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      First, growth conditions are controlled in biodiversity experiments, i.e., both monocultures and mixtures are the same in resource space. Species do not have opportunity to exploit resources outside experimental area. For example, if less productive species on normal soils outperform more competitive species on saline/alkaline soil, these “less productive species” are considered “more productive”.    

      Second, as discussed in our paper (lines 367-376; Figure 1), more research is needed to determine relationships between species traits (biomass or height) and relative competitive ability. By then, scaling by the maximum would not be needed. There has been quite a lot of research on such relationships; we should leave this to subject experts to determine what would be mostly appropriate for species studied.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Correct, if species competitive ability differs substantially, the more competitive species in the mixture would grow like partial density monoculture. This extra growth should not be treated as sources of positive biodiversity effects, simply because it does not result from positive species interactions.   

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      AP is, by no means, systematic. Remember, AP is based on covariance equation (or Price equation) that has nothing to do with species interactions, other than nice-looking mathematical form (Bourrat et al., 2023). Ecological meanings are subjectively given to CE and SE. Therefore,  CE and SE reflect what we call them, not what they really mean.    

      The remainder measures deviations from the null expectation, due to only competition effect, and can not be partitioned any further. The remainder would be positive for more competitive species and negative for less competitive species in mixture relative to their full density monoculture. The deviation of observed yields from competitive expectations indicates dominance of positive or negative species interactions. All these are clearly outlined at lines 201-221.   

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      We do not see why not.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      To help understand the variations of competitive growth response with relative competitive ability, the x axis of Figure 1 is labelled with null expectation, competitive expectation, and competitive exclusion from minimum to maximum deviation of competitive ability from community average.

      We have followed terms used in biodiversity partitioning and changing terms can be confusing.  

      Examples:

      - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      There are some differences in meaning, but that is what CE and SE have been generally used for. Using different terms can be confusing and does not help understanding the problems with AP.

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      We are explaining effects of competitive interactions on species yield, and ultimately on community yield that can be linked to “resource partitioning" and "facilitation", and "species interference".

      More specific species interactions require detailed biological investigation and cannot be determined through partitioning of biomass production.  

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      As suggested, “absence of interspecific interactions” was replaced with “equal interspecific and intraspecific interactions”.

      We have removed lines 155-162 to reduce duplication. However, our method is based on null expectation that needs to be introduced, despite it is part of AP.

      Other points:

      - line 66: community productivity, not ecosystem productivity.

      Both community productivity and ecosystem productivity are used in biodiversity research, although meaning can be slightly different. Comparatively, ecosystem productivity is more common.

      - line 68: community average responses are with respect to relative yields - this is important!

      - line 64: what are "species effects of species interactions"?

      We searched and did not find “species effects of species interactions”.

      - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.

      It, in fact, refers to yield changes. For example, less productive species, at active growth, are more responsive to changes in competition, while more productive species, at inactive growth (i.e., aging), are less responsive to changes in competition.   

      - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

      The phrase was modified to “positive contribution of competitive dominance to ecosystem productivity based on the null expectation”.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      Strengths:

      I can find a lot of value in endeavouring to improve our understanding of how biodiversity-ecosystem functioning relationships arise. I agree with the authors that competition is not well integrated into the complementarity and selection effect and interrogating this is important.

      Weaknesses:

      (1) The authors start the introduction very narrowly and do not make clear why it is so important to understand the underlying mechanisms driving biodiversity-ecosystem functioning relationships until the end of the discussion.

      There are different ways to start introduction; we believe that starting with the problems of the current approach is the most effective for outlining the study’s objective.  

      (2) The authors criticize the existing framework for only incorporating positive interactions but this is an oversimplification of the existing framework in several ways:

      We did not criticize the existing framework for only incorporating positive interactions. We criticize the existing framework, because it is not based on mechanisms of species interactions, but is extensively used to determine underlying mechanisms driving biodiversity-ecosystem functioning relationships.

      a. The existing partitioning scheme incorporates resource partitioning which is an effect of competition.

      Resource partitioning means that species utilize resources differently, while competition means species use the same resources. “resource partitioning is an effect of competition” is not true in biodiversity experiments that are often short in duration and controlled in conditions.  

      b. The authors neglect the potential that negative feedback from species-specific pests and pathogens can also drive positive BEF and complementarity effects but is not a positive interaction, necessarily. This is discussed in Schnitzer et al. 2011, Maron et al. 2011, Hendriks et al. 2013, Barry et al. 2019, etc.

      We did not. The feedback effect will be reflected in the differences between observed yields and competitive expectations if species in mixtures have different pests and pathogens relative to monocultures. The additive partitioning does not identify these feedback effects either.

      c. Hector and Loreau (and many of the other citations listed) do not limit competition to SE because resource partitioning is a byproduct of competition.

      Positive SE has been largely interpreted as the result of competition including Hector and Loreau (2001) and many others. It needs to be clear that neither of the additive components can be linked to specific mechanisms of species interactions. 

      Does “resource partitioning is a byproduct of competition” mean that species change their niche to avoid competition? If this is what the reviewer means, it may occur through long-term evolution, but not in short-term biodiversity experiments. Hector and Loreau (2001) clearly indicated that their complementarity effect includes both resource partitioning and facilitation.   

      (3) It is unclear how this new measure relates to the selection effect, in particular. I would suggest that the authors add a conceptual figure that shows some scenarios in which this metric would give a different answer than the traditional additive partition. The example that the authors use where a dominant species increases in biomass and the amount that it increases in biomass is greater than the amount of loss from it outcompeting a subdominant species is a general example often used for a selection effect when exactly would you see a difference between the two?:<br /> a. Just a note - I do think you should see a difference between the two if the species suffers from strong intraspecific competition and has therefore low monoculture biomass but this would tend to also be a very low-density monoculture in practice so there would potentially be little difference between a low density and high-density monoculture because the individuals in a high-density monoculture would die anyway. So I am not sure that in practice you would really see this difference even if partial density plots were incorporated.

      Linking new measure to SE or CE would be difficult (see many comparisons in Tables and Figures in our manuscript), as SE and CE are derived from mathematical equation and do not represent specific mechanisms of species interactions (Hector and Loreau 2012; Bourrat et al., 2023).

      (4) One of the tricky things about these endeavors is that they often pull on theory from two different subfields and use similar terminology to refer to different things. For example - in competition theory, facilitation often refers to a positive relative interaction index (this seems to be how the authors are interpreting this) while in the BEF world facilitation often refers to a set of concrete physical mechanisms like microclimate amelioration. The truth is that both of these subfields use net effects. The relative interaction index is also a net outcome as is the complementarity effect even if it is only a piece of the net biodiversity effect. Trying to combine these two subfields to come up with a new partitioning mechanism requires interrogating the underlying assumptions of both subfields which I do not see in this paper.

      Agree, microclimate amelioration is also part of positive effect and will be reflected in the difference between observed yield and competitive expectation. We can not separate the two mechanisms of positive species interactions without investigating influences of microclimate on growth and yield.

      (5) The partial density treatment does not isolate competition in the way that the authors indicate. All of the interactions that the authors discuss are density-dependent including the mechanism that is not discussed (negative feedback from species-specific pests and pathogens). These partial density treatment effects therefore cannot simply be equated to competition as the authors indicate.:

      We use partial density monoculture to determine maximum competitive growth response, effect of density-dependent intraspecific interactions, and species competitive ability to determine the level of maximum competitive growth response species can achieve in mixtures. There may be changes in species-specific pests and pathogens from partial to full density monocultures, which will be captured in competitive growth responses of individuals. We added at lines 186-188 to indicate that the maximum competitive growth response estimated would also include the effects of density-dependent pests, pathogens, or microclimates.   

      a. Additionally - the authors use mixture biomass as a stand-in for competitive ability in some cases but mixture biomass could also be determined by the degree to which a plant is facilitated in the mixture (for example).

      We used monoculture biomass, not mixture biomass, to assess competitive ability

      (6) I found the literature citation to be a bit loose. For example, the authors state that the additive partition is used to separate positive interactions from competition (lines 70-76) and cite many papers but several of these (e.g. Barry et al. 2019) explicitly do not say this.

      Barry et al. (2019) defined CE as overproduction from monocultures, an effect of positive interactions.  

      (7) The natural take-home message from this study is that it would be valuable for biodiversity experiments to include partial density treatments but I have a hard time seeing this as a valuable addition to the field for two reasons:

      a. In practice - adding in partial density treatments would not be feasible for the vast majority of experiments which are already often unfeasibly large to maintain.

      The reviewer suggested that quantity is more important than quality. Without partial density monocultures no one can separate different effects of species interactions, as suggested by Loreau and Hector, reviewers, and many others that effects of species interactions can not be clearly differentiated with replacement series design. Unreliable scientific findings are not valuable.

      b. The density effect would likely only be valuable during the establishment phase of the experiment because species that are strongly limited by intraspecific competition will die in the full-density plots resulting in low-density monocultures. You can see this in many biodiversity experiments after the first years. Even though they are seeded (or rarely planted) at a certain density, the density after several years in many monocultures is quite low.

      True. High or low density also depends on individual size; if individuals do not get enough resources, density is high. Therefore, density effect can be strong even as density drops substantially from initial levels.  

      Reviewer #4 (Public Review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript’s null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      It needs to be clear that we use two hypotheses, null hypothesis that is currently used with AP, and competitive hypothesis that is new with this manuscript. The null hypothesis helps determine changes in ecosystem productivity from all species interactions, while the competitive hypothesis helps partition changes in ecosystem productivity by mechanisms of species interactions, i.e., positive, negative, or competitive interactions.    

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning. The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them. Finally, it is unclear to me whether rejecting the ‘new’ null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. I will elaborate on each of these points below.

      First, there are many biodiversity experiments but those with partial density monocultures are rare. We found only one greenhouse experiment. We have to use simulation to illustrate different scenarios of species interactions to demonstrate how our approach works and how different it is from the AP.  

      Because of different methods used, the results of long history competition research (generally based on additive series design) cannot be used to define effects of competitive interactions in biodiversity research (generally based on replacement series design). This may be the reason that few competition researchers were cited in Loreau and Hector (2001).

      Our approach requires two hypotheses, null and competitive, and the meaning of deviation from these hypotheses are outlined at lines 201-221 for both individual species and community level assessments. Distinguishing changes in ecosystem productivity by species interactions would be of great interest to “ecologists, agronomists, conservationists, or others”.

      The critiques of biodiversity experiments and existing additive partitioning methods are overstated, as is the extent to which this new approach addresses its limitations. For example, the critique that current biodiversity experiments cannot reveal the effects of species interactions (e.g., lines 37-39) isn't generally true, but it could be true if stated more specifically. That is, this statement is incorrect as written because comparisons of mixtures, where there are interspecific and intraspecific interactions, with monocultures, where there are only intraspecific interactions, certainly provide information about the effects of species interactions (interspecific interactions). These biodiversity experiments and existing additive partitioning approaches have limits, of course, for identifying the specific types of interactions (e.g., whether mediated by exploitative resource competition, apparent competition, or other types of interactions). However, the approach proposed in this manuscript gets no closer to identifying these specific mechanisms of species interactions. It has no ability to distinguish between resource and apparent competition, for example. Thus, the motivation and framing of the manuscript do not match what it provides. I believe the entire Introduction would need to be rewritten to clarify what gap in knowledge this proposed approach is addressing and what would be gained by filling this knowledge gap.

      Our approach helps determine underlying mechanisms of species interactions, i.e., positive (resources partitioning or facilitation), negative, or competitive interactions. I am not sure how much we need to go further in identifying more specific mechanisms. If resource and apparent competition refers to resource and interference competition, our approach can tease apart them.

      I recommend that the Introduction instead clarify how this study builds on and goes beyond many decades of literature considering how competition and biodiversity effects depend on density. This large literature is insufficiently addressed in this manuscript. This fails to give credit to previous studies considering these ideas and makes it unclear how this manuscript goes beyond the many previous related studies. For example, see papers and books written by de Wit, Harper, Vandermeer, Connolly, Schmid, and many others. Also, note that many biodiversity experiments have crossed diversity treatments with a density treatment and found no significant effects of density or interactions between density and diversity (e.g., Finn et al. 2013 Journal of Applied Ecology). Thus, claiming that these considerations of density are novel, without giving credit to the enormous number of previous studies considering this, is insufficient.

      A misunderstanding here. Our approach is not designed to test density effect. The same density is held across full density monocultures and mixtures. We use partial density monocultures to determine what species may competitively achieve in full density mixture, without positive or negative interspecific interactions.  

      Replacement series designs emerged as a consensus for biodiversity experiments because they directly test a relevant null hypothesis. This is not to say that there are no other interesting null hypotheses or study designs, but one must acknowledge that many designs and analyses of biodiversity experiments have already been considered. For example, Schmid et al. reviewed these designs and analyses two decades ago (2002, chapter 6 in Loreau et al. 2002 OUP book) and the overwhelming consensus in recent decades has been to use a replacement series and test the corresponding null hypothesis.

      Some wrong impressions. We are not trying to supplant “replacement series” with “additive series”; we use “additive series” designs to supplement “replacement series” design for partitioning changes in ecosystem productivity by mechanisms of species interactions, which would not be possible with “replacement series” design alone, as suggested by many including reviewers.   

      It is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. Most biodiversity experiments and additive partitions have tested and quantified diversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions. If there was no less competition and no more facilitation in mixtures than in monocultures, then there would be no positive diversity effects. Rejecting this null hypothesis is relevant when considering coexistence in ecology, overyielding in agronomy, and the consequences of biodiversity loss in conservation (e.g., Vandermeer 1981 Bioscience, Loreau 2010 Princeton Monograph). This manuscript proposes a different null hypothesis and it is not yet clear to me how it would be relevant to any of these ongoing discussions of changes in biodiversity.

      Our method begins with the null expectation: that intraspecific and interspecific interactions are equivalent. We then propose the competitive hypothesis as a second non-exclusive hypothesis which tests the dominance of positive or negative specific interactions. As shown by its name, the additive partitioning model has been advocated for partitioning biodiversity effects by some ecological mechanisms (CE and SE). The ecological meaning of deviation from the two hypotheses are outlined at lines 201-221 for both individual species and community level assessments.   

      The claim that all previous methods 'are not capable of quantifying changes in ecosystem productivity by species interactions and species or community level' is incorrect. As noted above, all approaches that compare mixtures, where there are interspecific interactions, to monocultures, where there are no species interactions, do this to some extent. By overstating the limitations of previous approaches, the manuscript fails to clearly identify what unique contribution it is offering, and how this builds on and goes beyond previous work.

      The reviewer implies that a partial truth equals the whole truth. The same argument can also be applied to the additive partitioning if relative yield total or response ratio provides a kind of comparison between mixture and monocultures. Our statement is correct in the way that previous approaches are not designed to separate changes in ecosystem productivity by species interactions, as indicated by other reviewers. The additive partitioning is built on Price equation (covariance equation) that has never been biologically demonstrated for relevance in biodiversity partitioning (Bourrat et al., 2023).  

      We made clear that our work is built on and beyond the null expectation with addition of competitive expectation.

      The manuscript relies on simulations because it claims that current experiments are unable to test this, given that they have replacement series designs (lines 128-131). There are, however, dozens of experiments where the replacement series was repeated at multiple densities, which would allow a direct test of these ideas. In fact, these ideas have already been tested in these experiments and density effects were found to be nonsignificant (e.g., Finn et al. 2013).

      Out of point. Again, we are not testing density effect. Partial density is used to determine competitive growth responses that species may achieve in mixture based on their relative competitive ability. We used simulations, as partial density monocultures are used only in one experimental study that has been included in our study.  

      It seems that the authors are primarily interested in trees planted at a fixed density, with no opportunity for changes in density, and thus only changes in the size of individuals (e.g., Fig. 1). In natural and experimental systems, realized density differs from the initial planted density, and survivorship of seedlings can depend on both intraspecific and interspecific interactions. Thus, the constrained conditions under which these ideas are explored in this manuscript seem narrow and far from the more complex reality where density is not fixed.

      We use fixed density only for convenience. In biodiversity experiments, density can increase or decrease over time from initial levels. However, initial density is generally used in evaluation of species interactions. If interest is community productivity, density change does not need to be considered. Again, we are not testing density effects.    

      Additional detailed comments:

      It is unclear to me which 'effects' are referred to on line 36. For example, are these diversity effects or just effects of competition? What is the response variable?

      It means the effect of competitive interactions on productivity and should be clear based on previous sentences.

      The usefulness of the approach is overstated on line 52. All partitioning approaches, including the new one proposed here, give the net result of many types of species interactions and thus cannot 'disentangle underlying mechanisms of species interactions.'

      Not sure how many types of species interactions the reviewer referred to. If mechanisms of species interactions are grouped in three categories (positive, negative, and competitive) as has been in biodiversity research, our approach can tease them apart.   

      The weaknesses of previous approaches are overstated throughout the manuscript, including in lines 60-61. All approaches provide some, but not all insights. Sweeping statements that previous approaches are not effective, without clarifying what they can and can't do, is unhelpful and incorrect. Also, these statements imply that the approach proposed here addresses the limitations of these previous approaches. I don't yet see how it does so.

      The weaknesses of previous approaches are not overstated in terms of separating changes in ecosystem productivity by species interactions. As pointed by other reviewers, none of the previous approaches are designed for quantifying changes in ecosystem productivity by species interactions.   

      The definitions given for the CE and SE on line 71 are incorrect. Competition affects both terms and CE can be negative or have nothing to do with positive interactions, as noted in many of the papers cited.

      We are not trying to define CE and SE but only point out how CE and SE have been generally used in biodiversity research (see recent publication by Feng et al., 2022).

      The proposed approach does not address the limitations noted on lines 73 and 74.

      It does in terms of sources of net biodiversity effect, whether from positive, negative or competitive interactions.

      The definition of positive interactions in lines 77 and 78 seems inconsistent with much of the literature, which instead focuses on facilitation or mutualism, rather than competition when describing positive interactions.

      Much of the literature supports our definition (see Loreau and Hector, 2001). In biodiversity research, positive interactions include resource partitioning and facilitation. What we are trying to point out is that competition affects species and community level assessments based on the null expectation and needs to be separated.

      Throughout the manuscript, competition is often used interchangeably with resource competition (e.g., line 82) and complementarity is often attributed to resource partitioning (e.g., line 77). This ignores apparent competition and partitioning enemy-free niche space, which has been found to contribute to biodiversity effects in many studies.

      If apparent competition refers to interference competition, it is included in negative interaction. Changes in species-specific pests and pathogens in mixture will be captured in positive or negative effects through facilitation or interference.  

      In what sense are competitive interactions positive for competitive species (lines 82-83)? By definition, competition is an interaction that has a negative effect. Do you mean that interspecific competition is less than intraspecific competition? I am having a very difficult time following the logic.

      I am glad the reviewer raised this question that may confuse many others and has never been clearly discussed. It all depends on how comparison is made. If species performance in mixture are compared with that in partial density monocultures, as is in competition research, competition effect is negative for all species. If comparison is made between mixture and full density monocultures, as is done in biodiversity research, competition effect should be positive for more competitive species and negative for less competitive species, with resources flowing from less to more competitive species in mixture relative to full density monocultures.   

      Therefore, the definitions of competitive interactions based on additive series design in competition research cannot be used to describe competitive interactions based on replacement series design in biodiversity research. In biodiversity research, the effects of competitive interactions are never clearly defined at species or community level and mixed up with those of other species interactions.      

      Results are asserted on lines 93-95, but I cannot find the methods that produced these results. I am unable to evaluate the work without a repeatable description of the methods.

      We have added references on sources of these data.

      The description of the null hypothesis in the common additive partitioning approach on lines 145-146 is incorrect. In the null case, it does not assume that there are no interspecific interactions, but rather that interspecific and intraspecific interactions are equivalent.

      Correct, changes have been made as suggested.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I recommend to:

      - re-organize the presentation of the material (see my concerns in the public review section). The manuscript is very difficult to read.

      Changes have been made to help with understanding of our approach. Figure 1 was modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).

      - explore the mathematical form the the remainder term. It seems important to understand that the remainder capture terms unrelated to competition as defined in the present scope.

      The remainder measures deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. The term has clear meaning, positive for more competitive species and negative for less competitive species (lines 202-204), and does not need to be further explored or partitioned. The deviations of observed yields from competitive expectations are outlined in lines 205-221.  

      Reviewer #4 (Recommendations For The Authors):

      The authors should be sure to include reproducible methods and share any data and code.

      Both simulation and experimental data are shared through supplementary tables. Calculations are included in excel spreadsheets and do not require program coding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors present a mean-field model that describes the interplay between (protein) aggregation and phase separation. Different classes of interaction complexity and aggregate dimensionality are considered, both in calculations concerning (equilibrium) phase behavior and kinetics of assembly formation.

      Strengths:

      The present work is, although purely theoretical, of high interest to understanding biological processes that occur as a result of a coupling between protein aggregation and phase separation. Of course, such processes are abundant, in the living cell as well as in in-vitro experiments. I appreciate the consideration of aggregates with various dimensionality, as well as the categorization into different ”interaction classes”, together with the mentioning of experimental observations from biology. The model is convincing and underlines the complexity associated with the distribution of proteins across phases and aggregates in the living cell.

      Weaknesses:

      There are a few minor weaknesses.

      Reviewer 2 (Public Review):

      This work deals with a very difficult physical problem: relating the assembly of building blocks on a molecular scale to the appearance of large, macroscopic assemblies. This problem is particularly difficult to treat, because of the large number of units involved, and of the complex way in which these units-monomers-interact with each other and with the solvent. In order to make the problem treatable, the authors recur to a number of approximations: Among these, there is the assumption that the system is spatially homogeneous, i.e., its features are the same in all regions of space. In particular, the homogeneity assumption may not hold in biologically relevant systems such as cells, where the behavior close to the cell membrane may strongly differ from the one in the bulk. As a result, this hypothesis calls for a cautious consideration and interpretation of the results of this work. Another notable simplification introduced by the authors is the assumption that the system can only follow two possible behaviors: In the first, each monomer interacts equally with the solvent; no matter the size of the cluster of which it is part. In the second case, monomers in the bulk of a cluster and monomers at the assembly boundary interact with the solvent in a different way. These two cases are considered not only because they simplify the problem, but also because they are inspired by biologically relevant proteins.

      With these simplifications, the authors trace the phase diagram of the system, characterizing its phases for different fractions of the volume occupied by the monomers and solvent, and for different values of the temperature. The results qualitatively reproduce some features observed in recent experiments, such as an anomalous distribution of cluster sizes below the system saturation threshold, and the gelation of condensed phases above such threshold.

      Reviewer 3 (Public Review):

      Summary:

      The authors combine classical theories of phase separation and self-assembly to establish a framework for explaining the coupling between the two phenomena in the context of protein assemblies and condensates. By starting from a mean-field free energy for monomers and assemblies immersed in solvent and imposing conditions of equilibrium, the authors derive phase diagrams indicating how assemblies partition into different condensed phases as temperature and the total volume fraction of proteins are varied. They find that phase separation can promote assembly within the protein-rich phase, providing a potential mechanism for spatial control of assembly. They extend their theory to account for the possibility of gelation. They also create a theory for the kinetics of self-assembly within phase separated systems, predicting how assembly size distributions change with time within the different phases as well as how the volumes of the different phases change with time.

      Strengths:

      The theoretical framework that the authors present is an interesting marriage of classic theories of phase separation and self-assembly. Its simplicity should make it a powerful general tool for understanding the thermodynamics of assembly coupled to phase separation, and it should provide a useful framework for analyzing experiments on assembly within biomolecular condensates.

      The key advance over previous work is that the authors now account for how self-assembly can change the boundaries of the phase diagram.

      A second interesting point is the explicit theoretical consideration for the possibility that gelation (i.e. self-assembly into a macroscopic aggregate) could account for widely observed solidification of condensates. While this concept has been broadly discussed, to date I have yet to see a rigorous theoretical analysis of the possibility.

      The kinetic theory in sections 5 and 6 is also interesting as it extends on previous work by considering the kinetics of phase separation as well as those of self-assembly.

      Weaknesses:

      A key point the authors make about their theory is that it allows, as opposed to previous research, to study non-dilute limits. It is true that they consider gelation when the 3D assemblies become macroscopic. However, dilute solution theory assumptions seem to be embedded in many aspects of their theory, and it is not always clear where else the non-dilute limits are considered. Is it in the inter-species interaction χij? Why then do they never explore cases for which χij is nonzero in their analysis?

      We explicitly consider that monomers and aggregates are non-dilute with respect to solvent. This is evident in accounting for the mixing entropy of all components, including the solvent. Moreover, we account for interactions among the monomers and the different aggregates with the solvent. We consider the case where each monomeric unit, independent in aggregate it is part of, interacts the same way with the solvent. Please note that this case corresponds to a non-dilute scenario where interactions indeed drive phase separation.

      The connection between this theory and biological systems is described in the introduction but lost along the main text. It would be very helpful to point out, for instance, that the presence of phase separation might induce aggregation of proteins. This point is described formally at the end of Section 3, but a more qualitative connection to biological systems would be very useful here.

      We thank the referee for the useful comment, we now mention this in the introduction (line 80) and point out the biological relevance of assembly formation and localization via the presence of phase separation (lines 268 and 283).

      Building on the previous point, it would be helpful to give an intuitive sense of where the equations derived in the Appendices and presented in the main text come from and to spell out clear physical interpretations of the results. For example, it would be helpful to point out that Eq. 4 is a form of the law of mass action, familiar from introductory chemistry. It would be useful to better explain how the current work extends on existing previous work from these authors as well as others. Along these lines, closely related work by W. Jacobs and B. Rogers [O. Hedge et al. 2023, https://arxiv.org/abs/2301.06134; T. Li et al. 2023, https://arxiv.org/abs/2306.13198] should be cited in the introduction. The results discussed in the first paragraph of Section 3 on assembly size distributions in a homogeneous system are well-known from classic theories of self-assembly. This should be acknowledged and appropriate references should be added; see for instance, Rev. Mod. Phys. 93, 025008 and Statistical Thermodynamics Of Surfaces, Interfaces, And Membranes by Sam Safran. Equation 14 for the kinetic of volume fractions is given with reference to Bauermann et al. 2022, but it should be accompanied by a better intuitive interpretation of its terms in the main text. In particular, how should one understand the third term in this equation? Why does the change in volume impact the change of volume fraction in this way?

      We thank the referee for the suggestions. We have included the missing references, with a particular emphasis on DNA nanostars that inhibit phase separation in DNA liquids in the definition of class II. We added intuitive explanations of the main equations, such as Eqs. (4),(8),(14), (17), and (18). Notice that, according to Mysels, Karol J., J. Chem. Educ., 33, 178 (1956) (https://pubs-acs-org.sire.ub.edu/doi/epdf/10.1021/ed033p178) we refer to (18) as the law of mass action.

      The discussion in the last paragraph of Section 6 should be clarified. How can the total amount of protein in both phases decrease? This would necessarily violate either mass or volume conservation. Also, the discussion of why the volume is non-monotonic in time is not clear.

      A decrease in the total amount of protein in both phases does not violate mass conservation, if the volume of the phases varies accordingly. In particular, the volume of the denser phase should grow. This given, in the case presented the total protein amount in the dense phase decreases, while in the dilute phase increases. For this reason, we revised the paragraph and now explain the results in more detail (see lines starting from 407). The nonmonotonic volume change is indeed a puzzling finding that, as we now state in the manuscript, requires further investigation. Given the lack of analytical approaches available to tackle the complex kinetics in the presence of coexisting phases, we believe that this analysis goes beyond the scope of the present paper.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Line 96: I feel a mentioning/definition/explanation and perhaps some discussion on the parameter M (limiting aggregate size) would have been in place in the introduction of Equation (1). Furthermore, in the usual interpretation, Flory interaction parameters (symbolized χ) are dimensionless, as, classically, they represent an exchange energy (normalized by kT), defined on a monomeric basis. Here they seem to carry the dimension of energy.

      We thank the reviewer for the observation. We have included a brief comment on M and mentioned that we use χ parameters that carry the dimension of energy such that, varying kBT, we scale at the same time the term containing interaction propensities (χ) and the one containing internal energies (_e_int). See the comment on line 127

      Line 150: The choice of ρi \= i physically implies that a single protein is assumed to have the same as a solvent molecule. This may be a bit of a stretch. This assumption leads to an overestimation of the translational entropy of the aggregates (first term in Equation (1)). Acknowledging that ρ_1 >> ρs_ would give a pronounced desymmetrization of the phase diagram (I suspect).

      Indeed, in the case of monomers only, the assumption leads to a symmetric phase diagram which may be unrealistic. Once assemblies form, however, the phase diagram becomes asymmetric and for this reason we decided to assume ρi \= i, simplifying the theoretical analysis. We have added a clarifying sentence in the manuscript, see line 163

      Furthermore, the pictures in Figure 1a-c suggest the presence of a disordered residue, the degree of swelling of which might affect binding strength (see for instance: https://doi.org/10.3389/fnmol.2022.962526).

      We added a comment on the possible coupling between internal free energies and interaction propensities, such as the swelling mechanism that affects binding sites, and included the reference above (line 215).

      Line 154-156: It’s unclear what is meant with ”an internal bond that keeps each assembly together”. How should this be interpreted on an intuitive physical level?

      We apologise for being unclear. We meant the internal bonds that lead to the formation of assemblies. We have now rephrased this sentence in the main text (lines starting from 169).

      Line 254: The fact that ϕsg is defined below does not mean it does not fall out of the air here. The same holds for the consideration of the limit M →∞. Ideally, the main text should stand on its own, in particular with respect to physical intuitiveness, as well as the necessity and interest of discussion topics. Technical details, derivations and additional information can be in an appendix.

      We agree with the referee and added some physical insights about the limit. We now also state clearly in the main text (line 298) that _ϕ_sg is affected by temperature and the free energy of internal bonds.

      Line 257: ”Since we do not explicitly include the solvent in assembly formation we will consider the gel as a phase without solvent and thus ϕtot \= 1”. I’m not sure if I can agree with this. I would say, a gel, certainly in biological context, almost per definition contains a large fraction of solvent, i.e. here water. The situation ”ϕtot \= 1” would rather be a solid precipitate. Is gelation properly captured by this model?

      We thank the referee for this very relevant observation. We now state in the main text that the model predicts a macroscopic assembly which we call ’the gel phase’, in agreement with previous literature. Then, to clarify, we added the sentence ”Please note that, since we do not explicitly include the solvent in assembly formation (see reaction scheme in Fig.1a), in our model the gel corresponds to a phase without solvent, _ϕ_tot \= 1. To account for biological gels that can be rich in water, our theory can be straightforwardly extended by incorporating the solvent into the reaction scheme.”, see main text line 300.

      Line 268: Shouldn’t ”solvent” be ”solution”? If fsol is given by Equation (1), surely not only the solvent is considered.

      Indeed, this is a typo, and we now use the term ’solution’ instead of ’solvent.’

      Line 273: At this stage, the only information provided in the main text is that ω∞ is ”a constant that does not affect chemical nor phase equilibrium, except in the limit M →∞” (see lines 153-154). This is a little bit too abstract for me. Again, the main text should stand on its own, meaning the reader should not have to rely on an appendix to at least have an intuitive physical understanding of any modeling or input parameter discussed in the main text.

      We thank the reviewer for pointing this out. We now comment on the physical interpretation of ω∞ in the main text, see lines from 320 on.

      Figure 4. appears in Equation (39) but it is not defined.

      We thank the reviewer for pointing this out. We have reshaped appendix 6A, making use of chemical activities and clarified the origin of the rate .

      Line 317. I don’t fully understand the intention of the remark on the model being adaptable for ”primary and secondary nucleation”. How/in what way is this different from association and dissociation? For instance, classical nucleation theory is based on association and dissociation of monomeric units to and from clusters.

      We agree that the kinetic rate coefficients kij (appearing in the association and dissociation rates ∆rij, Eq. 17) in our manuscript already depend on assembly length, see Appendix 6 B, where we now clarified their definition. Please note that, however, that secondary nucleation is a special kind of association, for which the kinetic rate coefficients corresponding to associations of small assemblies, i.e. kij with_i,j_ ≪ M, explicitly depend on the presence of large assemblies with sizes l ≫ 1. In our manuscript, we have not accounted for such a dependence. We now make this aspect clear in the manuscript, see Appendix 6 B.

      Line 321. Why is ∆rij called the ”monomer exchange rate”? In line 318 the same parameter is defined as the ”reaction rate for the formation of a (i+j)-mer”. Why should these be the same?

      We thank the reviewer for spotting this typo.

      Line 323. Why do these calculations use M = 15?

      The exploration of a 15-dimensional phase space is already numerically challenging. We are currently working on a generalization of the numerical scheme to work with larger values of M but, to discuss the fundamental physical principles, we kept M \= 15.

      Reviewer 2 (Recommendations For The Authors):

      The manuscript presents several issues, on both the scientific and presentational level, which need to be carefully addressed. Please find below a list of the points that need to be addressed by the authors, divided into major and minor points. Major issues:

      • A general, major concern about the results in the paper is the homogeneity assumption. I do understand that repeating the whole analysis presented in the manuscript by allowing for spatial inhomogeneities partially goes beyond the scope of this paper. However, the authors should at least discuss how such inhomogeneities may alter the results in a qualitative way, and treat explicitly the presence of inhomogeneity in one prototypical case treated in the manuscript. Namely, what happens if the volume fractions and relative molecular volumes in the free energy (1) depend on space, e.g., ϕiϕi(x)?

      We would like to stress that, in the present paper, we do account for spatial inhomogeneities. Indeed, in the case of phase separation, we consider systems which are divided into two phases, characterized by different values of the assemblies’ volume fractions ϕi. We do, however, consider the system to be homogeneous inside the phases, implying a jump in the value of the volume fraction at the interface between the two phases. In this sense, the analysis we carry out is valid in the thermodynamic limit, where gradients of the volume fractions ϕi(x) within the phases, can be neglected. On the other hand, considering the full spatial problem, i.e. solving the equations for M \= 15 spatially varying fields, would be numerically extremely challenging.

      • The authors’ results relate molecular assembly- a phenomenon at the molecular scale-to phase separation-a mesoscopic or macroscopic phenomenon. The authors should stress the conceptual importance of this connection between scales, and present their results from the perspective of a multi-scale model.

      We thank the reviewer for pointing this out. We now emphasize the multi-scale feature of our model in the introduction (line 80).

      • Starting from Section 1, the reader is not well guided through the sections that follow. The authors should provide an outline of the line of though that they are going to follow in the following sections, and logically connect each section to the next one with a short paragraph at the end of each section. This paragraph should resume what has been addressed in the current section, and the connection with the topic that will be addressed in the next one.

      We agree with the reviewer and have added a transitioning sentence at the end of each paragraph.

      • ’We focus on linear assemblies (d = 1)’: Given the striking differences of the results between d = 1 and d > 1 shown above, the authors should discuss what happens for d > 1 as well.

      • ’In figure Fig. 5a, we show the initial and final equilibrium binodals (black and coloured curve, respectively), for the case of linear assemblies (d = 1) belonging to class 1’: Again, show what happens for d > 1.

      We agree with the reviewer, the kinetics in d > 1 would be definitely interesting. However, in this case, one assembly can become macroscopic (i.e. M must be set to ∞). This requires some substantial modification in the kinetic scheme, like introducing an absorbing boundary condition for monomers ’sucked in’ the gel. We prefer to leave this for future work, and now state it explicitly in the manuscript (line 383).

      • ’This difference arises because, within class 2, monomers in the bulk of an assembly have reduced interaction propensity with respect to the boundary ones. As a consequence, the formation of large clusters shifts the onset of phase separation to higher ϕtot values.’: To prove this argument, the authors should show Fig. 2g and h for d > 1. In fact, by varying d, the effect of the boundary vs. bulk also varies.

      We prefer to discuss the thermodynamics of d > 1 in section 4 on gelation. There we present only a single phase diagram so as not to blow up the discussion on equilibrium too much.

      • ’referring for simplicity to systems belonging to Class 1’: The authors should do the same analysis for Class 2.

      We agree with the reviewer. However, again not to blow up the discussion on equilibrium, we leave it for future work.

      • ’other, implying that the corresponding Flory-Huggins parameter χij vanishes’: Why?

      The explanation based on a lattice model is reported in Appendix 2, and is now more clearly referenced (line 185).

      Minor issues:

      • Eq. (10): Here the authors should explain in the main text, possibly in a simple and intuitive way, why the number of monomers i and the space dimension d enter the righthand side of this equation in this particular way.

      We thank the reviewer for pointing this out. We added the physical origin of the scaling with dimension in Eq. (10) and in Eq. (8), as pointed out by reviewer 3.

      • ’The second and fifth terms of fsol characterize the internal free energies’: What do you mean by ’characterize the internal free energies’? Please clarify.

      As we now state more clearly (lines 114-120), these two contributions include the internal free energies ω_s and _ωi, stemming from the free energy of internal bonds that lead to assembly formation.

      • ’depend on the scaling form of the’: Scaling with respect to what ? Please clarify.

      We have now clarified that the scaling is with respect to the assembly size i.

      • Figure 2 is way too dense: it should be split into two figures, and the legend of each of the two figures should be expanded to properly guide the reader to understand the figures.

      We understand the reviewer’s point of view. To avoid altering the present flow, we decided not to split the figure, but we have included shaded boxes to better guide the reader.

      • ’this is a consequence of the gelation transition’: Please clarify

      • ’and this limitation can be dealt with by introducing explicitly the infinite-sized gel in the free energy’: Why? Please clarify.

      We have now rephrased these sentences, hopefully in a clearer way. We now state: ’We know that this divergence is physical, and is caused by the gelation transition. This limitation can be dealt with by introducing explicitly a term in the free energy that accounts for an infinite-sized assembly (the gel)’, see lines 320-322.

      • Figure 4: Add plots of panels d, e, h and i with log scale on the y axis to make explicit an eventual exponential behavior, and revise the text accordingly

      Not to further complicate Figure 4, we preferred to display the logarithmic plots of the equilibrium distribution in the appendix, see Figure A3-1.

      • ’... an equilibrium distribution which monotonously decreases with assembly size’: It is not the distributions that decreases but the cluster volume fraction, please rephrase.

      We thank the reviewer for pointing this out and have now rephrased this sentence (line 394).

      Reviewer 3 (Recommendations For The Authors):

      I could not obtain the exact form of Eq 29 in App 3, can the authors elaborate on this calculation. App 3: What does it mean binodal agrees well with ϕsg? And doesn’t ϕsg depend on temperature through phi tilde? What temperature is this result for?

      We apologise for the unclear explanation. We now state in detail that Eq. (29) is obtained by plugging the expression of ϕi given in Eq. (24) into Eq. (1), in the main text. The dependence of ϕ<sub>1</sub> on ϕ<sub>tot</sub> is expressed in Eq. (26), and we have omitted linear terms in ϕ<sub>tot</sub>, since they do not affect phase equilibrium (see lines 802-809). Moreover, ϕsg depends indeed on k<sub>B</sub>T. We refer to the comparison between the full curve ϕsg in the k<sub>B</sub>T−ϕ<sub>tot</sub> plane, and the branch of the binodal between the triple point (indicated now with a cross) and ϕ<sub>tot</sub> \= 1. The two curves are close, as expected since both correspond to the boundary between homogeneous mixtures and the gel state, obtained with different methods.

      The references to Figures in the appendices are confusing. Please make it clear whether Figures in the main text or the appendices are being referenced. On a related note, the Appendix figures seem to be placed in appendices whose text describes something else - Appendix 2, Figure 1 should be moved to Appendix 3; Appendix 3, Figure 1 should be moved to Appendix 4; etc.

      We revised the appendix, corrected the figure positions and clarified their references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research offers an in-depth exploration and quantification of social vocalization within three families of Mongolian gerbils. In an enlarged, semi-natural environment, the study continuously monitored two parent gerbils and their four pups from P14 to P34. Through dimensionality reduction and clustering, a diverse range of gerbil call types was identified. Interestingly, distinct sets of vocalizations were used by different families in their daily interactions, with unique transition structures exhibited across these families. The primary results of this study are compelling, although some elements could benefit from clarification

      Strengths:

      Three elements of this study warrant emphasis. Firstly, it bridges the gap between laboratory and natural environments. This approach offers the opportunity to examine natural social behavior within a controlled setting (such as specified family composition, diet, and life stages), maintaining the social relevance of the behavior. Secondly, it seeks to understand short-timescale behaviors, like vocalizations, within the broader context of daily and life-stage timescales. Lastly, the use of unsupervised learning precludes the injection of human bias, such as pre-defined call categories, allowing the discovery of the diversity of vocal outputs.

      Weaknesses:

      (1) While the notable differences in vocal clusters across families are convincing, the drivers of these differences remain unclear. Are they attributable to "dialect," call usage, or specific vocalizing individuals (e.g., adults vs. pups)? Further investigation, via a literature review or additional observation, into acoustic differences between adult and pup calls is recommended. Moreover, a consistent post-weaning decrease in the bottom-left cluster (Fig. S3) invites interpretation: could this reflect drops in pup vocalization?

      Thank you for bringing up this point of clarification. Without knowledge of individual vocalizers, we are unable to rigorously assess pronunciation differences between individuals, however we can get a clear proxy for dialect through observing usage differences between families. We’ve added the following text (blue) in the Discussion to help clarify:

      “To address whether gerbils also exhibit family specific vocal features, we compared GMM-labeled vocal cluster usages across the three recorded families and showed differences in vocal type usage (Figure 3). The differences in this study align with the definition of human vocal dialect, which is a regional or social variety of language that can differ in pronunciation, grammatical, semantic and/or language use differences (Henry et al., 2015). This definition of dialect is inclusive of both pronunciation differences (e.g. a Bostonian’s characteristic pronunciation of “car” as “cah”) and usage differences (e.g. a Bostonian’s preferential usage of the words “Go Red Sox” vs. a New Yorker’s preferential usage of the words “Go Yankees”). In our case, vocal clusters can be rarely observed in some families yet highly over-expressed in others (e.g. analogous to language usage differences in humans), or highly expressed in both families, but contain subtle spectrotemporal variations (Figure 3D, Family 1 cluster 11 vs. Family 3 clusters 2, 18, 30; e.g. analogous to pronunciation differences in humans).”

      Indeed, our recordings obtained after pup removal could suggest that adults may use fewer low frequency calls (bottom left cluster in UMAP). However, this dataset does not permit a proper assessment of post-weaning pup calls. In fact, our results and the literature shows that adults are likely to use low frequency calls, but only during social interactions with pups or other adults. For example, Furuyama et al. 2022 describe a number of low frequency call types used by adults in agonistic social interactions, which look similar to a low frequency call type used by pups described in Silberstein et al. 2023. Similarly, Ter-Mikaelian et al. 2012 (their Figure 6) recorded several types of sonic vocalizations during adult social interaction. To our knowledge, it has not been shown whether gerbil pups and adults produce distinct call types. It is a challenging problem to solve, as animals placed in isolation (i.e. an experimental condition for which the identity of the vocalizer is known) vocalize infrequently and of the limited number they might emit, they do not use the full range of vocalizations described in the literature (RP personal observations). To properly address this question, one would need to elicit full use of the vocal repertoire through free social interaction, then attribute calls to individual vocalizers via sound source localization and/or head-mounted microphones — we are currently pursuing both of these technical challenges, but this is outside the scope of this manuscript.

      Although the literature reflects the limitations discussed above, we have added a brief paragraph to the Discussion (limitations section) that addresses the reviewer’s question about the development of vocalizations:

      “Although we were not able to attribute vocalizations to individual family members, we did seek to determine the importance of family structure by comparing audio recordings before and after removal of the pups at P30. The results show a clear effect of family integrity, and the sudden reduction of sonic calls following pup removal (Figure S3) could suggest that these vocalizations are produced selectively by pups.

      However, there is ample evidence that adult gerbils also produce sonic vocalizations. For example, a number of low frequency call types are used by adults during a range of social interactions (Ter-Mikaelian et al., 2012; Furuyama et al., 2022), some of which are similar to a low frequency call type used by pups (Silberstein et al., 2023). Vocalization patterns of developing gerbils depend on isolation or staged interactions. Thus, when gerbil pups are recorded during isolation, ultrasonic vocalization rate declines and sonic vocalizations increase for animals that are in a high arousal state (De Ghett 1974, Silberstein et al., 2023). As gerbils progress from juvenile to adolescent development (P17-55) a significant increase in ultrasonic vocalization rate is observed during dyadic social encounters, with a distinct change in usage pattern that depends upon the sex of each animal (Holman & Seale 1991, Holman et al. 1995). The development of vocalization types has been assessed in another member of the Gerbillinae subfamily, called fat-tailed gerbils (Pachyuromys duprasi), during isolation and handling. Here, the number of ultrasonic vocalization syllable types increase from neonatal to adult animals (Zaytseva et al. 2019), while some very low frequency sonic call types were rarely observed after P20 (Zaytseva et al. 2020). By comparison, mouse syllable usage changes during development, but pups produced 10 of the 11 syllable types produced by adults (Grimsley et al. 2011). In summary, our understanding of the maturation of vocalization usage remains limited by our inability to obtain longitudinal data from individual animals within their natural social setting. For example, when recorded in their natural environment, chimpanzees display a prolonged maturation of vocalization complexity, such as the probability of a unique utterance in a sequence, with the greatest changes occuring when animals begin to experience non-kin social interactions (Bortolato et al. 2023).”

      (2) Developmental progression, particularly during pre-weaning periods when pup vocal output remains unstable, might be another factor influencing cross-family vocal differences. Representing data from this non-stationary process as an overall density map could result in the loss of time-dependent information. For instance, were dominating call types consistently present throughout the recording period, or were they prominent only at specific times? Displaying the evolution of the density map would enhance understanding of this aspect.

      This is a great suggestion. Thank you for bringing it up. To address this, we have added an additional figure (Figure 4) to the main text (Note that the former Figure 4 is now Figure 5). New text associated with this new figure was added to the Results and Discussion sections:

      Results

      “Vocal usage differences remain stable across days of development It is possible that the observed vocal usage differences could result from varying developmental progression of vocal behavior or overexpression of certain vocal types during specific periods within the recording. To assess the potential effect of daily variation on family specific vocal usage, we visualized density maps of vocal usage across days for each of the families (Figure 4A). There are two noteworthy trends: 1.) the density map remains coarsely stable across days (rows) and 2.) the maps look distinct across families on any given day (columns). This is a qualitative approximation for the repertoire’s stability, but does not take into account variation of call type usage (as defined by GMM clustering of the latent space). Figure 4B, shows the normalized usage of each cluster type over development for each family. Cluster usages during the period of “full family, shared recording days” (postnatal days beneath the purple bars) are stable across days within families – as is apparent by the horizontal striations in the plot – though each family maintains this stability through using a unique set of call types. This is addressed empirically in Figure 4C, which shows clearly separable PCA projections of the cluster usages shown in Figure 4B (purple days). Finally, we computed the pairwise Mean Max Discrepancy (MMD) between latent distributions of vocalizations from individual recording days for each of the families (Figure 4D). This shows that across-family repertoire differences are substantially larger than within-family differences. This is visualized in a multidimensional scaling projection of the MMD matrix in Figure 4E.”

      Discussion

      “The described family differences collapse data from multiple days into a single comparison, however it’s possible that factors such as vocal development and/or high usage of particular vocal types during specific periods of the recording could explain family differences. Therefore, we took advantage of the longitudinal nature of our dataset to assess whether repertoire differences remain stable across time. First, we visualized vocal repertoire usage across days as either UMAP probability density maps (Figure 4A) or daily GMM cluster usages (Figure 4B). Though qualitative, one can appreciate that family repertoire usage remains stable across days and appears to differ on a consistent daily basis across families. To formally quantify this, we first projected GMM cluster usages from Figure 4B into PC space and show that family GMM cluster usage patterns are highly separable, regardless of postnatal day (Figure 4C). If families had used a more overlapping set of call types, then the projections would have appeared intermixed. Next, we performed a cluster-free analysis by computing the pairwise MMD distance between VAE latent distributions of vocalizations from each family and day (Figure 4D). This analysis shows very low MMD values across days within a family (i.e. the repertoire is highly consistent with itself), and high MMD values across families/days (greater than would be expected by chance; see shuffle control in Figure S2D). The relative differences in this matrix are made clear in Figure 4E, which provides additional evidence that family vocal repertoires remain stable across days and are consistently different from other families. Taken together, we believe that this is compelling evidence that differences in vocal repertoires between families are not driven by dominating call types during specific phases in the recording period; rather, families consistently emit characteristic sets of call types across days. This opens up the possibility to assess repertoire differences over much shorter time periods (e.g. 24 hours) in future studies.”

      (3) Family-specific vocalizations were credited to the transition structure, a finding that may seem obvious if the 1-gram (i.e., the proportion of call types) already differs. This result lacks depth unless it can be demonstrated that, firstly, the transition matrix provides a robust description of the data, and secondly, different families arrange the same set of syllables into unique sequences.

      Thank you for these important suggestions. We agree that it is true that the 2-gram transition structure must vary based on the 1-gram structure. To determine whether this influences the interpretation of the finding, we have added Figure S5 and the following text in the Results section:

      “To determine whether differences in 1-gram structure contribute to differences in the transition (2-gram) structure, we performed a number of controls. Although subtle, vertical streaks are clearly present in shuffled transition matrices that correspond to 1-gram usages (Figure S5A-B). Given the shuffled data structure, we sought to determine whether the observed transition probabilities differed significantly from chance levels. We randomly shuffled label sequences 1000 times independently for each family to generate a null transition matrix distribution. Using these null distributions and the observed transition probabilities, we computed a p-value for each transition using a one-sample t-test and created a binary transition matrix indicating which transitions happen above chance levels (Figure S5C, black pixels, p <= 0.05 after post hoc Benjamini-Hochberg multiple comparisons correction). As is made clear in Figure S5C, most transitions for each family occur significantly above chance levels, despite the inherent 1-gram structure. Moreover, by looking at transitions from a highly usage cluster type used roughly the same proportion across families (cluster 12), we show that families arrange the same sets of vocal clusters into unique sequences (Figure S5D). We believe that this provides compelling evidence that the 1-gram structure does not change the interpretation of the main claim that transition structure varies by family. “””

      To address your second point, we inspected frequent transitions from individual syllables to all other syllables using bigram transition probability graphs. This revealed a common trend that across all families, many shared and unshared transitions existed, suggesting that families use the same sets of syllables to make unique transition patterns. Figure S5D shows a single syllable example of the phenomenon, with red lines indicating the shared transition types between families and black showing transition patterns not shared between families (i.e. unique family-specific transitions, or lack thereof).”

      Reviewer #2 (Public Review):

      Peterson et al., perform a series of behavioral experiments to study the repertoire and variance of Mongolian gerbil vocalizations across social groups (families). A key strength of the study is the use of a behavioral paradigm which allows for long term audio recordings under naturalistic conditions. This experimental set-up results in the identification of additional vocalization types. In combination with state of the art methods for vocalization analysis, the authors demonstrate that the distribution of sound types and the transitions between these sound types across three gerbil families is different. This is a highly compelling finding which suggests that individual families may develop distinct vocal repertoires. One potential limitation of the study lies in the cluster analysis used for identifying distinct vocalization types. The authors use a Gaussian Mixed Model (GMM) trained on variational auto Encoder derived latent representation of vocalizations to classify recorded sounds into clusters. Through the analysis the authors identify 70 distinct clusters and demonstrate a differential usage of these sound clusters across families. While the authors acknowledge the inherent challenges in cluster analysis and provide additional analyses (i.e. maximum mean discrepancy, MMD), additional analysis would increase the strength of the conclusions. In particular, analysis with different cluster sizes would be valuable. An additional limitation of the study is that due to the methodology that is used, the authors can not provide any information about the bioacoustic features that contribute to differences in sound types across families which limits interpretations about how the animals may perceive and react to these sounds in an ethologically relevant manner.

      The conclusions of this paper are well supported by data, but certain parts of the data analysis should be expanded and more fully explained.

      • Can the authors comment on the potential biological significance of the 70 sound clusters? Does each cluster represent a single sound type? How many vocal clusters can be attributed to a single individual? Similarly, can the authors comment on the intra-individual and inter-individual variability of the sound types within and across families?

      Previous work documenting the Mongolian gerbil repertoire (Ter-Mikaelian 2012, Kobayasi 2012) has revealed ~12 vocalization types that vary with social context. Our thinking is that we are capturing these ~12 (plus a few more, as illustrated in Figure 2C) as well as individual or family-specific variations of some call types. Although the number of discrete call types is likely less than 70, it’s plausible that variation due to vocalizer identity pushes some calls into unique clusters. This idea is supported by the fact that both naked mole rats and Mongolian gerbils have been shown to exhibit individual-specific variation in vocalizations, though only in single call types (Barker 2021, Figure 1; Nishiyama 2011, Table I). The current study is not ideal to test this prediction, as we cannot attribute each vocalization to individual family members. Using our 4-mic array, we attempted to apply established sound source localization techniques to assign vocalizations to individuals (Neunuebel 2015), but the technique failed, presumably due to high amounts of reverberation in the arena. We are currently developing a custom deep learning based sound localization algorithm, and had hoped to extract individual animal vocalizations from our data set (part of the reason why this manuscript has taken longer than expected to return!), but the performance is not yet satisfactory for large groups of animals. We have added text to the Methods sections with the context outlined above to further justify the use of ~70 clusters.

      • As a main conclusion of the paper rests on the different distribution of sound clusters across families, it is important to validate the robustness of these differences across different cluster parameters. Specifically, the authors state that "we selected 70 clusters as the most parsimonious fit". Could the authors provide more details about how this was fit? Specifically, could the authors expand upon what is meant by "prior domain knowledge about the number of vocal types...". If the authors chose a range of cluster values (i.e. 10, 30, 50, 90) does the significance of the results still hold?

      Thank you for the suggestion, this is an important point that we have addressed with new analyses in the revision (see GMM clustering methods and new Figure S4). The prior domain knowledge referenced is with respect to the information known about the Mongolian gerbil vocal types provided in the response above. We have made this more clear in the discussion.

      We mainly based our selection of the number of clusters using the elbow method on GMM held-out log likelihood (Figure S2C). Around 70 clusters is when the likelihood begins to plateau, though it’s clear that there are a number of reasonable cluster sizes. To assess whether cluster size has an effect on interpretation of the family differences result, we added Figure S5, where we varied the number of GMM clusters used and compared cluster usage differences across families (Figure S4A). We quantified pairwise family differences in cluster usage by computing the sum of the absolute value of differential cluster usages, for each GMM cluster value (Figure S4B). We find that relative usage differences remain unchanged across the range of cluster values used, indicating that GMM cluster size does bias the finding.

      • While VAEs are powerful tools for analyzing complex datasets in this case they are restricted to analysis of spectrogram images. Have the authors identified any acoustic differences (i.e. in pitch, frequency, and other sound components) across families?

      Though it’s true that this VAE is limited to spectrograms, the VAE latent space has been shown to correspond to real acoustic features such as frequency and duration, and contain a higher representational capacity than traditional acoustic features (Goffinet 2021, Figure 2). Therefore, clustering of the latent space necessarily means that vocalizations with similar acoustic features are clustered together regardless of their family identity.

      Despite this, your point is well taken that there could be systematic differences in certain acoustic features for specific call types. We are not able to ascertain this with the current dataset. This is addressed in Barker 2021 by recording a single call type (soft chirp) from individuals within and across families. Mongolian gerbils have been shown to exhibit individual differences in the initial, terminal, minimum, and maximum frequency of the ultrasonic up-frequency modulated call type (Figure 2, top right green; Nishiyama 2011, Figure 1A ). Therefore it’s possible that family-specific differences exist for that particular call type. To assess whether other call types show family or individual differences, it’s necessary to either 1.) elicit all call types from an animal in isolation or 2.) determine vocalizer identity in social-vocal interactions. The problem with the former idea is that gerbils only produce up-frequency modulated USVs in isolation and there is no known way to elicit the full vocal repertoire in single animals. The latter idea would allow for full use of the vocal repertoire, but requires invasive techniques (e.g., skull-implanted microphones, or awake-behaving laryngeal nerve recordings) that permit assignment of vocalizations to individuals during a natural social interaction. We are actively exploring solutions to both problems.

      It’s likely that future studies will look deeper into acoustic differences between individuals and families. Therefore, we have added acoustic feature quantification of vocalizations in each of the GMM clusters as a reference (Figure S6).

      Reviewer #3 (Public Review):

      Summary:

      In this study, Peterson et al. longitudinally record and document the vocal repertoires of three Mongolian gerbil families. Using unsupervised learning techniques, they map the variability across these groups, finding that while overall statistics of, e.g., vocal emission rates and bout lengths are similar, families differed markedly in their distributions of syllable types and the transitions between these types within bouts. In addition, the large and rich data are likely to be valuable to others in the field.

      Strengths:

      - Extensive data collection across multiple days in multiple family groups.

      -  Thoughtful application of modern analysis techniques for analyzing vocal repertoires. - Careful examination of the statistical structure of vocal behavior, with indications that these gerbils, like naked mole rats, may differ in repertoire across families.

      Weaknesses:

      - The work is largely descriptive, documenting behavior rather than testing a specific hypothesis.

      - The number of families (N=3) is somewhat limited.

      We agree that the number of families is relatively small. However, our new analysis of vocal repertoire by postnatal day (Figure 4) demonstrates that the finding is quite robust. A high sample-size study was outside the scope of this initial observational study given the difficulty of obtaining and processing longitudinal data of this scale. In light of new analyses in Figure 4, we are confident that future studies will not need so much data to characterize family-specific differences. A single 24-hour recording should be sufficient, making comparison of many more families relatively straightforward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Several minor concerns:

      (1) The three thresholds used for vocalization segmentation lack explanation.

      Figure 1C's first vocal event appears to define the first gap via the gray threshold (th_2, as the trace does not cross the black line) and the second gap via the black threshold (th_1 or th_3). And this is not addressed in the Methods section.

      Thank you for bringing this to our attention. We agree, this is presented in an unnecessarily complicated way. We have updated the methods section describing the thresholding procedure.

      “Sound onsets are detected when the amplitude exceeds 'th_3' (black dashed line, Figure 1C), and sound offset occurs when there is a subsequent local minimum e.g., amplitude less than 'th_2' (gray dashed line, Figure 1C), or 'th_1' (black dashed line, Figure 1C), whichever comes first. In this specific use case, th_2 (5) will always come before th_1 (2), therefore the gray dashed line will always be the offset. A subsequent onset will be marked if the sound amplitude crosses th_2 or th_3, whichever comes first. For example, the first sound event detected in Figure 1C shows the sound amplitude rising above the black dashed line (th_3) and marks an onset. Subsequently, the amplitude trace falls below the gray dashed line (th_2) and an offset is marked. Finally, the amplitude rises above th_2 without dipping below th_3 and an onset for a new sound event is marked. Had the amplitude dipped below th_3, a new sound event onset would be marked when the amplitude trace subsequently exceeded th_3 (e.g. between sound event 2 and 3, Figure 1C). The maximum and minimum syllable durations were selected based on published duration ranges of gerbil vocalizations (Ter-Mikaelian et al. 2012, Kobayasi & Riquimaroux, 2012).”

      (2) The determination of multi-syllabic calls could be explained further. In Figure 1C, for instance, do syllables separated by short gaps (e.g., the first syllable and the rest of the first group, and the third group in this example) belong to the same call or different calls?

      We have added an operational definition of mono vs. multisyllabic calls in the Results section:

      “Vocalizations occur as either single syllables bounded by silence (monosyllabic) or consist of combinations of single syllables without a silent interval (multisyllabic).”

      Under this definition, the examples you mentioned in Figure 1C are considered monosyllabic. One could reasonably expand the definition to include calls separated by less than X ms of silence for example, however we choose not to do that in this study. A deeper understanding of the phonation mechanisms for different gerbil vocalization types would be helpful to more rigorously determine the distinction between mono vs. multisyllabic vocalizations.

      (3) Labeling the calls shown in Fig. 3D in the latent feature space would help highlight within-family diversity and between-family similarities.

      Great suggestion. We have updated Figure 3 to include where in UMAP space each family’s preferred clusters are.

      (4) In the introduction, the statement, "Therefore, our study considers the possibility that there is a diversity of vocalizations within the gerbil family social group" doesn't naturally follow from the previous example. This could be rephrased.

      Agreed, thank you. We revised this section of the introduction to flow better.

      Reviewer #2 (Recommendations For The Authors):

      While outside the scope of the current study the authors may consider the following experiments and analysis for future studies:

      • Do vocal repertories retain their family signatures across subsequent generations of pups? (i.e. if vocalizations are continually monitored during second or third litters of the same parents).

      • Do the authors observe any long-term changes in family repertoires related to the developmental trajectory of the pups? Are there changes in individual pup vocal features or sound type usage throughout development?

      Thank you for these great suggestions. Given that naked mole rats learn vocalizations through cultural transmission, it would be interesting to see whether other subterranean species with complex social structures (gerbils, voles, rats) have similar abilities. A straightforward way to assess this possibility could be as you suggest — are latent distributions of vocalizations from multi-generational families closer together than cross-family differences? If true, this would provide compelling evidence to investigate further.

      We partially address your second suggestion in our response to Reviewer 1 and in Figure S4, which shows that the family repertoire remains stable throughout this particular period of development. This doesn’t rule out the possibility that there could be other phases of development that undergo more vocal change. Your final suggestion is an area that we are actively researching and eager to know the answer to. A follow-up question: could differences in pup vocal features contribute to differential care by parents?

      Reviewer #3 (Recommendations For The Authors):

      In all, I found the paper clearly written and the figures easy to follow. One small suggestion:

      Figure 1: I can't see the black and gray thresholds described in the caption very well. Perhaps a zoom-in to the first 0.15s or so of the normalized amplitude plot would better display these.

      Agreed, thank you. We added a zoom-in to Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Unckless and colleagues address the issue of the maintenance of genetic diversity of the gene diptericin A, which encodes an antimicrobial peptide in the model organism Drosophila melanogaster.

      Strengths:

      The data indicate that flies homozygous for the dptA S69 allele are better protected against some bacteria. By contrast, male flies homozygous for the R69 allele better resist starvation than flies homozygous for the S69 allele.

      Weaknesses:

      -I am surprised by the inconsistency between the data presented in Fig. 1A and Fig. S2A for the survival of male flies after infection with P. rettgeri. I am not convinced that the data presented support the claim that females have lower survival rates than males when infected with P. rettgeri (lines 176-182).

      The two figures are pasted above (1A left, S2A right). The reviewer is correct that the two experiments look different in terms of overall outcomes for males, though qualitatively similar. These two experiments were performed by different researchers, and as much as we attempt to infect consistently from researcher to researcher, some have heavier hands than others. It is true that the genotype that has the largest sex effect is the arginine line (blue) where females (in this experiment) are as bad as the null allele, and males are more intermediate. Also note that the experiments in S2A (male and female) were done in the same block so they are the better comparison. We’ve reflected this in the manuscript.

      - The data in Fig. 2 do not seem to support the claim that female flies with either the dptA S69 or the R69 alleles have a longer lifespan than males (lines 211-215). A comment on the [delta] dpt line, which is one of the CRISPR edited lines, would be welcome.

      We’ve reworded this section based on these comments.

      - The data in Fig. 2B show that male flies with the dptA S69 or R69 alleles have the same lifespan when poly-associated with L. plantarum and A. tropicalis, which contradicts the claim of the authors (lines 256-260).

      This is correct – the effect is only in females. It has been corrected.

      Reviewer #2 (Public Review):

      Summary: In this study, the authors delve into the mechanisms responsible for the maintenance of two diptericin alleles within Drosophila populations. Diptericin is a significant antimicrobial peptide that plays a dual role in fly defense against systemic bacterial infections and in shaping the gut bacterial community, contributing to gut homeostasis.

      Strengths: The study unquestionably demonstrates the distinct functions of these two diptericin alleles in responding to systemic infections caused by specific bacteria and in regulating gut homeostasis and fly physiology. Notably, these effects vary between male and female flies.

      Weaknesses: Although the findings are highly intriguing and shed light on crucial mechanisms contributing to the preservation of both diptericin alleles in fly populations, a more comprehensive investigation is warranted to dissect the selection mechanisms at play, particularly concerning diptericin's roles in systemic infection and gut homeostasis. Unfortunately, the results from the association study conducted on wild-caught flies lack conclusive evidence.

      This is true that the wild fly association study is mostly a negative result. We’ve backed off the claim about the Morganella association.

      Major Concerns:

      Lines 120-134: The second hypothesis is not adequately defined or articulated. Please revise it to provide more clarity. Additionally, it should be explicitly stated that the first part of the first hypothesis (pathogen specificity), i.e., the superior survival of the S allele in Providencia infections compared to the R allele, has been previously investigated and supported by the results in the Unckless et al. 2016 paper. The current study aims to additionally investigate the opposite scenario: whether the R allele exhibits better survival in a different infection. Please consider revising to emphasize this point.

      We’ve reworded this section and added references to both the Unckless et al. 2016 and Hanson et al. 2023 papers.

      Figures and statistical analyses: It is essential to present the results of significant differences from the statistical analyses within Figures 1B, 2B, and 3. Additionally, please include detailed descriptions of the statistical analysis methods in the figure legends. Specify whether the error bars represent standard error or standard deviation, particularly in Figure 3, where assays were conducted with as few as 3 flies.

      We have added statistical details as requested.

      Lines 317-318 (as well as 320-328): The data related to P. rettgeri appear somewhat incomplete, and the authors acknowledge that bacterial load varies significantly, and this bacterium establishes poorly in the gut. These data may introduce more noise than clarity to the study. Please consider revising these sections by either providing more data, refining the presentation, or possibly removing them altogether.

      The fact that P. rettgeri establishes poorly in the gut in wildtype flies is the result of several unpublished experiments in the Lazzaro and Unckless labs. We don’t have this as a figure because it was not directly tested in these experiments. We’ve added a note that it is personal observation and we’ve reworked the discussion in the second section.

      Lines 335-387 and Figure 4: Although these results are intriguing and suggest interactions between functional diptericin and fly physiology, some mediated by the gut microbiome, they remain descriptive and do not significantly contribute to our understanding of the mechanism that maintains the diptericin alleles.

      While the reviewer is correct that these experiments do not elucidate mechanism, they do strongly suggest (based on the controlled nature of the experiments) that the physiological tradeoffs are due to Diptericin genotype. The disagreement is the level of “mechanism”. At the evolutionary level, the demonstration of a physiological cost of a protective immune allele is sufficient to explain the maintenance of alleles. However, we have not determined (and did not attempt to determine) why Diptericin genotype influences these traits. That will have to wait for future experiments.

      Lines 399-400: The contrast between this result and statement and the highly reproducible data presented in Figures 2-4 should be discussed.

      We’ve added some discussion to this section including a reference to the “inconstancy” of the Drosophila gut microbiome.

      Lines 422-429 and Figure 5D: The conclusion regarding an association between diptericin alleles and Morganellaceae bacteria is not clearly supported by Figure 5D and lacks statistical evidence.

      We’ve changed this to just be suggestive.

      Reviewer #3 (Public Review):

      Summary:

      This paper investigates the evolutionary aspects around a single amino acid polymorphism in an immune peptide (the antimicrobial peptide Diptericin A) of Drosophila melanogaster. This polymorphism was shown in an earlier population genetic study to be under long-term balancing selection. Using flies with different AA at this immune peptide it was found that one allelic form provides better survival of systemic infections by a bacterial pathogen, but that the alternative allele provides its carriers a longer lifespan under certain conditions (depending on the microbiota). It is suggested that these contrasting fitness effects of the two alleles contribute to balance their long-term evolutionary fate.

      Strengths:

      The approach taken and the results presented are interesting and show the way forward for studying such polymorphisms experimentally.

      Weaknesses:

      (1) A clear demonstration (in one experiment) that the antagonistic effect of the two selection pressures isolated is not provided.

      The study is overwhelming with many experiments and countless statistical tests. The overall conclusion of the many experiments and tests suggests that "dptS69 flies survive systemic infection better, while dptS69R flies survive some opportunistic gut infections better." (line 444-446). Given the number of results, different experiments, and hundreds of tests conducted, how can we make sure that the result is not just one of many possible combinations? I suggest experimentally testing this conclusion in one experiment (one may call this the "killer-experiment") with the relevant treatments being conducted at the same time, side by side, and the appropriate statistical test being conducted by a statistical test for a treatment x genotype interaction effect.

      This is a nice idea but would not work in practice since the fly lines used are different (gnotobiotic vs conventional) and gnotobiotics have to be derived from axenic lines that need a few generations to recover from the bleaching treatment.

      (2) The implication that the two forms of selection acting on the immune peptide are maintained by balancing selection is not supported.

      The picture presented about how balancing selection is working is rather simplistic and not convincing. In particular, it is not distinguished between fluctuating selection (FL) and balancing selection (BL). BL is the result of negative frequency-dependent selection. It may act within populations (e.g. Red Queen type processes, mating types) or between populations (local adaptation). FL is a process that is sometimes suggested to produce BL, but this is only the case when selection is negative frequency dependent. In most cases, FL does not lead to BL.

      The presented study is introduced with a framework of BL, but the aspects investigated are all better described as FL (as the title says: "A suite of selective pressures ..."). The two models presented in the introduction (lines 62 to 69; two pathogens, cost of resistance) are both examples for FL, not for BL.

      We’ve added a discussion of how fluctuating selection and balancing selection relate at the end of the discussion.

      Finally, no evidence is presented that the different selection pressures suggested to select on the different allelic forms of the immune peptide are acting to produce a pattern of negative frequency dependence.

      We are not arguing for negative frequency dependent selection. We assume throughout that Dpt allele does not drive overall frequency of P. rettgeri in populations since it is a ubiquitous microbe. So evolution within D. melanogaster therefore has little to no effect on density of the pathogen.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments:

      Line 31: Rewrite the sentence mentioning "homozygous serine" for improved clarity, especially since the S/R polymorphism of Diptericin has not been introduced yet.

      This has been changed to be vague in terms of specific alleles and just refers to “one allele” vs the other.

      Lines 87-94: Consider reorganizing this paragraph to maintain a logical flow of the discussion on the Drosophila immune system and the IMD pathway.

      We explored other orders, but we think that as is (IMD to AMPs in general to AMPs in Drosophila) makes the most sense here.

      Line 99: Provide an explanation of balancing selection for a broader readership, differentiating it from other modes of selection.

      We added a brief discussion but note that the intro has significant discussion of balancing selection.

      Lines 105-106: Please provide a proper reference. Additionally, ensure that the Unkless et al. 2016 paper is correctly referenced, both in lines 111 and 138-141.

      This has been added.

      Lines 138-141: It would be beneficial to state that the previous study by Unkless et al. 2016 did not control for genetic background, which is why the assay was redone with gene editing.

      This has been added.

      Lines 296-303: Clarify the source of the survival observations and consider incorporating this data into Figure 2 for improved visualization.

      We’ve clarified that this is Figure 2.

      Lines 390-394: Explain the distinctions between vials and cages, particularly in terms of food consumption, exposure to bacteria, etc., which can be relevant to gut homeostasis.

      We’ve added a discussion of why these two approaches are complementary.

      Reviewer #3 (Recommendations For The Authors):

      Statistics

      Statistical results are limited to the presentation of p-values (several hundred of them!). For a proper assessment of the statistical analyses, one would also want to see the models used and the test statistics obtained.

      The statistical tests done are often unclear. For example, in several experiments, pools of 3 trials (blocs) of multiple animals were tested. The blocs need to be included in the model. Likewise, it seems that multiple delta-dpt fly genotypes were produced. Apparently, they were not distinguished later. Were they considered in the statistical analyses? By contrast, two lines of dptS69R flies were reported to show differences. What concept was applied to test for line difference in some cases and not in others?

      In the same dataset (i.e. data resulting from one experiment), it seems that mostly multiple tests were done. For example, in one case each treatment was contrasted to the dptS69 flies. It is generally not acceptable to break down one dataset in multiple subsets and conduct tests with each subtest. One single model for each experiment should be done. This may then be followed by post-hoc tests to see which treatments differ from each other.

      We’ve attempted to clarify these statistical approaches throughout.

      Minor points

      In the legend of Figure 3 it says: "A) monoassociations where each plot represents a different experiment,". This is unclear to me. First, how many plots are there: 3 or 12? Second, what means "experiment"? Are these treatments, or entirely different experiments? How was this statistically taken into account?

      We’ve changed this to “different condition” which is clearer. We performed statistical analysis independently for each condition and we’ve now discussed that.

      Fig. 5D. It is suggested in the text ("Most intriguing", line 426) and the figure legend that the abundance of Morganellaceae in wild-caught flies differs among genotypes. This is not visible in the figure and not convincingly shown in the text. No stats are given.

      We’ve now added that these differences are not significant.

      Line 458-461: This sentence is unclear.

      We’ve attempted to clarify.

      What is a "a traditional adaptive immune system"?

      We’ve reworded to “an adaptive immune system”.

      There are several typos in the manuscript. Please correct.

      We’ve attempted to fix typos throughout.

      Bold statements are often without references.

      We’ve attempted to add appropriate references throughout.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      R1-01 - Does ank-G-GFP label all isoforms (190, 270 and 480kDa) of ankG? From the images of the AIS and noR it appears that the large forms (270 and 480 kDa) are probably tagged with GFP. Did the authors check for puncta along dendrites and in dendritic spines, which are thought to be formed by the small (190 kDa) isoform? Perhaps a western blot to show that Ank-G-GFP labels all isoforms would be a useful addition to this study.

      We believe that AnkG-GFP indeed labels the major Ank3 transcripts in the brain, including the 190, 270, and 480 kDa isoforms, based both on known mRNA exon usage and on Western blot analysis (data not shown). Thus, theoretically, this model would be useful for examining the localization of 190 kD ankyrin-G to dendritic spines. While we attempted to examine this in sections from tissue, it was difficult to separate punctate ankyrinG-GFP labeling from the background. However, these experiments were done in genetic crosses that would label most pyramidal neurons in a given area (i.e. CaMKIIa-Cre). Given the Cre-dependence of this model, future experiments could utilize sparse transduction with a Cre virus that also fills neurons with soluble fluorophores (i.e. mCherry or tdTomato) to mark isolated neurons and identify dendritic spines, as exemplified in Fig. 2D. This would allow examination of subcellular localization of ankyrin-G within single pyramidal cells before and after induction of synaptic plasticity.

      R1-02 - In Figure 2, does all the native Ank-G get replaced by Ank-G-GFP? In Fig. 2E the GFP signal along the AIS of CamKII +ve neurons does not appear to be very homogeneous compared to the BIV-spectrin label. Have the authors carried out more experiments like those in 2F, using antibodies that label AnkG together with the GFP fluorescence of the labeled AnkG? It would also be informative to know if, as one might expect, the total levels of ankG-GFP correlate with the levels of ankG at the AIS.

      We agree that this is an important point and conducted additional experiments to address your concerns. Of course, we cannot exclude that some unmodified ankyrin-G remains in the AIS or other structures. We expect the turnover of the protein to be rather slow, and native ankyrin-G likely remains to some degree. However, our quantification demonstrates that the ankyrin-G-GFP labeling is sufficiently homogeneous to accurately represent AIS size, indicating proportional levels of GFP to native ankyrin-G. Animals were crossed with a CaMKIIa-Cre driver line and ex vivo slices were imaged live and after immunolabeling. We found a strong correlation between live ankyrin-G-GFP (patch clamp chamber), postfix ankyrin-G-GFP, postfix ankyrin-G, and βIV-spectrin immunosignals of the same AIS. Furthermore, our measurements of AIS length using the intrinsic GFP signal in combination with ankyrin-G, or βIV-spectrin antibodies showed significant overlap (see R103). We now included these graphs as supplemental Fig. S2 in the manuscript (pp. 8-9, ll. 173-177).

      R1-03 - Does the length and position of the AIS change when Ank-G is tagged with GFP? This seems like important information that is needed to make sure that there are no structural differences in AIS morphology when compared to native Ank-G.

      This is a very important point. We used the βIV-spectrin signal to compare the length of AIS with and without GFP modification in acute slices after patch-clamp recordings (N= 3 animals, 27 GFP+ and 48 GFP- AIS). As secondary control, we plotted the measurements of 160 AIS from a Thy1-GFP mouse line (N = 3 animals, 160 AIS). We found no significant difference in the length and position of the βIV-spectrin signal between GFP positive and negative AIS (p=0.3364 unpaired t-test, p=0.6138 non-parametric Mann-Whitney test, respectively). We have now included this analysis as Supplemental Fig. S2A in the manuscript (pp. 8-9, ll. 173-177). 

      R1-04 - How was node length measured in Figure 3? Was this done using the endogenous ank-G signal? In this figure, it would be informative to also quantify the number of noRs with a Nav1.6 stain. Perhaps even check if there are correlations between Ank-G-GFP and Nav1.6 levels. In this figure, it appears that comparisons are carried out between Ank-G-GFP +ve and -ve neurons in the same cryosections, from Ank-G-GFP mice crossed with CamKIIa-Cre. I worry that this may not be comparing the same types of axons. What cells do the CamKIIa -ve axons belong to? Also, the labels on the bar graph are confusing - perhaps GFP+ve and GFP-ve would be clearer?

      The reviewer raises an important point. We forgot to declare the signal which was used to measure node length in the manuscript. We have corrected this error and clearly state now in the Fig.3C legend that we used the ankyrin-G signal to quantify node length. Furthermore, using CaMKIIa-Cre mediated expression triggers ankyrin-G-GFP only in a genetically defined subset of neurons. Nodes that do not belong to this subgroup might very well have different node properties. Yet, we cannot assign potential differences in node length to the presence or absence of the GFP label, since we do not have an independent labeling technique for the very same subset of neurons. Since node lengths were similar and showed the same spread of lengths in our sample (Fig. 3C), we assume that the GFP length does probably not affect node length to a significant degree. We have now discussed this limitation in the result (p. 7, ll. 159-165) and method section (p. 30, ll. 644-645) and provide Supplementary Fig. S1 for more clarity. As suggested by the reviewer, we have measured mean fluorescence intensities between 91 GFP+ and 141 GFP- nodes using automated image processing in Imaris. The nodes were again defined by the ankyrin-G signal. We found no difference in length and ellipticity between the groups. We repeated this analysis and compared fluorescence intensities of Nav1.6 and ankyrin-G antibodies and again found no statistical differences between both groups. As suggested by the reviewer, we investigated whether ankyrin-G-GFP interferes with the fluorescence intensities of sodium channels (Nav1.6) and ankyrin-G in general. While the GFP signal showed a strong correlation with ankyrin-G, we found no interdependence with the Nav1.6 signal, indicating that the GFP label does not interfere with the general molecular composition of the nodes. We included these new analyses in Supplemental Fig. S1 (p. 7, ll. 159-165).

      R1-05 - In Figure 4 it would also be important to show the distribution of AIS molecules along the AIS, compared to the GFP signal, to establish whether this spatial arrangement of AIS-specific molecules remains intact. For example, Nav1.6 has been described as a more distally-located channel. As the authors point out, the example in A appears to show precisely this feature, but there is no quantification. The same applies to Kv1.2. This would also allow the authors to provide some quantification across multiple AISs, rather than just example images.

      We agree that quantifying and comparing AIS-associated proteins would be informative. We measured the intensity profiles of Nav1.6 and Kv2.1 in neighboring AIS and found no preferences for either end of the AIS, neither of GFP-positive nor GFP-negative AIS. We want to note that not all neurons exhibit a distal localization of Nav1.6 and hypothesize that our samples (neocortex layer II) also fall into this group. We included this new graph as Supplemental Fig. S2D and E in the manuscript (p. 9, ll. 180-184).

      R1-08 - In Figure 4, did the +Cre condition result in all cells showing a GFP-labelled AIS? If not, were the autocorrelations for +Cre-treated neurons done specifically on cells that expressed AnkG-GFP?

      We assume the reviewer refers to the autocorrelation in Figure 6. In this in vitro paradigm, we used virus-induced Cre expression which triggered ankyrin-G-GFP in almost all neurons. The orange boxplots describe the autocorrelation of all ankyrin-G, using a C-terminal antibody as in Fig.6C, but in neurons that also express ankyrin-G-GFP. The green samples use the GFP signal of ankyrin-GFP. We clarified this in the graph and legend of Fig. 6C (pages 14-15).

      R1-09 - As mentioned above in Figure 3, the comparisons in Figure 5 (GFP +ve and -ve neurons) may not be comparing like-for-like neurons. I imagine that many of the CamKII+ve cells in the cortex and hippocampus will be GABAergic interneurons, whereas presumably all of the CamKII+ve neurons will be pyramidal cells. Have the authors made sure that they are comparing across the same cell types? The fact that the number of axo-axonic synapses is similar across the two populations (Fig. 5B) does suggest that similar neuron types (presumably pyramidal cells) were compared in the hippocampus, but some other way of making sure would be a nice addition.

      We agree with the reviewer that the grey and green boxes are not sampled from the same subset of neurons, since only CaMKIIa-positive principal cells will express ankyrin-G-GFP. However, we are confident that the selected AIS belong to pyramidal neurons in both cases. Principal neurons can be well distinguished from interneurons not only by the size, shape, and position of their somas but also by the length and thickness of their AIS. We have performed previous studies on the AIS of interneurons using genetic GAD and parvalbumin markers. Thus, we are confident that the plots in 5A and 5B are sampled from pyramidal neurons, though certainly from genetically different subsets. We now highlight and discuss this limitation in the result section (p. 11, ll. 215-217) and modified the graph in Fig. 5A and 5B for clarity.

      R1-10 - In Figure 6, what was the promoter for the DCre and Cre+ lentivirus? Was this also driven by CamKIIa? In culture it is not always easy to be sure of neuronal identity - did the authors try to bias their analysis to specific neuronal types?

      Indeed, the nature of the promotor was not stated in the legend or method section, which we now corrected. We used lentiviral FUW-nGFP-Cre and FUW-nGFP-ΔCre constructs to trigger ankyrin-G-GFP expression. Both viruses use the CMV (Cytomegalovirus) promoter, which drives constitutively high levels of gene expression in a wide range of cell types, including neuronal cells. The majority of neurons in dissociated hippocampal cultures are excitatory, especially larger cells with larger AIS, which were preferably used in the analysis. Thus, we cannot claim that AIS nanostructure is intact in cultured interneurons, but this is also true for in vivo conditions in general. Since mice did not show any obvious behavioral phenotypes, we are positive that interneuron functionality is preserved. We also note that the parallel expression of nuclear GFP in the infected neurons was undesired, but did not impact STED imaging due to that technique’s high resolution. 

      R1-11 - The ability to visualize the plasticity of the AIS in real-time is an important advance in the field. The loss of proximal Ank-G-GFP signal upon local application of 15 mM KCl is particularly interesting. The fact that neighboring AISs are not affected is surprising - do the authors know how local their KCl application was? Also, although the neighboring AISs are a nice control, the one control lacking here is the local application of normal solution (preferably 15 mM NaCl to account for osmolarity changes) to make sure that this does not affect the properties of the AIS.

      We used KCl puffs in previous, unrelated experiments where we observed that only cells directly in front of the pipette are visibly depolarized by an acute KCl puff (measured by patch-clamp). Due to technical limitations, patched and live imaged neurons were generally in the first 2-5 cell layers of the brain slice, which is well perfused by the constant flow of oxygenated ACSF. KCl is thus quickly diluted and carried away. We have visualized the concentration gradients via puff application by puffing the fluorescent marker fluorescein in the same recording condition. The cone of fluorescence was only visible in front of the pipette and vanished in less than a second post-pressure application. To verify that it is indeed KCl and not the mechanical stress that lead to the loss of proximal Ank-G-GFP, one would indeed need an ACSF puff control, which we did for other studies. However, this is not the point we wanted to make. Instead of studying live single-cell AIS plasticity, we want to demonstrate that such investigations are generally possible using the ankyrin-G-GFP line.

      Author response image 1.

      R1-12 - The ability to be able to image AISs in vivo is another important finding. Were the authors able to image noRs as well?

      We believe that this is indeed the case. The panels in Figure 9C contain densely labeled puncta that also remain in position from week 1 to week 2. These are likely nodes of Ranvier, although we do not have the means to verify their presence at this time.

      Reviewer #2:

      R2-01 - Are there indeed different Ank-G-GFP isoforms expressed in this model and could they correspond to classical neuronal Ank-G isoforms?

      This is an important issue that was also raised by reviewer #1. Please consult the respective section R1-01 above for our response.

      R2-02 - What is the rationale of doing Ank-G co-labelling in the case of Ank-G-GFP expression, rather than Pan-Nav staining for example? The co-staining with Nav1.6 antibody, when present, is however convincing.

      We used the co-labeling to emphasize that the ankyrin-G-GFP construct allows reliable investigation of the whole AIS. This is why we wanted to demonstrate that the ankyrin-G-GFP signal overlaps with other AIS markers, as well as all ankyrin-G in general (including potentially remaining native and unlabeled ankyrin-G). This was also a point raised by Reviewer 1, which is why we provided some additional graphs (see response R1-02). However, we agree that staining with another independent marker, such as Nav1.6 or βIVspectrin was necessary. 

      R2-03 - Figure 2D and F: what is the rationale for not using betaIV-Spectrin staining as in the other panels of this figure? Furthermore, could betaIV-Spectrin localization be affected by Ank-GGFP expression, as betaIV-Spectrin is known to depend on Ank-G for its AIS targeting? Are there any other AIS markers, which localization is known to be independent of Ank-G, that could have been used?

      We have compiled this figure from a multitude of different experimental setups from different labs to showcase the reliability and robustness of the ankyrin-G-GFP label. This is why the type of staining is not consistent among panels. However, we provide some quantification on the possible impact of ankyrin-G-GFP expression on the βIV-spectrin signal and the composition of the AIS in general. The STED image verifies that the basic subcellular arrangement of the cytoskeleton, including βIV-spectrin, remains intact (Fig. 6). Most AIS markers are at least in some way dependent on ankyrin-G expression, but FGF14 and neurofascin may be the most independent candidates (Fig. 4).

      R2-04 - Did the authors measure the mean AIS length and distance from cell soma in Ank-G-GFPexpressing neurons versus non-expressing ones (considering the same neuronal subtypes) to assess whether these were unaffected by Ank-G-GFP expression?

      This is an important point that was also raised by Reviewer 1 (see also our comments to R1-03). We have included this analysis now in the manuscript as Supplemental Fig. S2A (pp. 8-9, ll. 173-177).

      R2-05 - Figure 5C: the microglial staining and 3D reconstruction could have been clearer.

      We have modified the image and 3D rendering to make Figure 5C clearer to the reader. We hope that our changes suffice.

      R2-06 - Figure 8: do hippocampal neurons retain their electrophysiological properties after 20 DIV? It could strengthen this part of the work to have access to the electrophysiological data mentioned in the text. 

      This is an important issue. We did not perform any electrophysiological recordings in OTCs in the course of this study. Panel E uses acute hippocampal slices like in Fig. 7. We have performed patch-clamp experiments up to DIV 10 for an unrelated study (see graph for action potential firing, Author response image 2). There are not many studies performing electrophysiology in slice cultures due to the formation of a glial scar on top of the slices. However, multielectrode array (MEA) recordings demonstrated that hippocampal organotypic slice cultures remain viable and show electric activity past DIV 20 (though with decreased viability and activity). We kindly refer to the following publications on that matter:

      Author response image 2.

      Sample traces of action potentials triggered by cuttrent injections

      Gong W, Senčar J, Bakkum DJ, Jäckel D, Obien ME, Radivojevic M, Hierlemann AR. Multiple SingleUnit Long-Term Tracking on Organotypic Hippocampal Slices Using High-Density Microelectrode Arrays. Front Neurosci. 2016 Nov 22;10:537. doi: 10.3389/fnins.2016.00537. PMID: 27920665; PMCID: PMC5118563.

      Mohajerani MH, Cherubini E. Spontaneous recurrent network activity in organotypic rat hippocampal slices. Eur J Neurosci. 2005 Jul;22(1):107-18. doi: 10.1111/j.1460-9568.2005.04198.x. PMID: 16029200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript, the authors explore the mechanism by which Taenia solium larvae may contribute to human epilepsy. This is extremely important question to address because T. solium is a significant cause of epilepsy and is extremely understudied. Advances in determining how T. solium may contribute to epilepsy could have significant impact on this form of epilepsy. Excitingly, the authors convincingly show that Taenia larvae contain and release glutamate sufficient to depolarize neurons and induce recurrent excitation reminiscent of seizures. They use a combination of cutting-edge tools including electrophysiology, calcium and glutamate imaging, and biochemical approaches to demonstrate this important advance. They also show that this occurs in neurons from both mice and humans. This is relevant for pathophysiology of chronic epilepsy development. This study does not rule out other aspects of T. solium that may also contribute to epilepsy, including immunological aspects, but demonstrates a clear potential role for glutamate.

      Strengths:

      - The authors examine not only T. solium homogenate, but also excretory/secretory products which suggests glutamate may play a role in multiple aspects of disease progression.

      - The authors confirm that the human relevant pathogen also causes neuronal depolarization in human brain tissue

      - There is very high clinical relevance. Preventing epileptogenesis/seizures possibly with Glu-R antagonists or by more actively removing glutamate as a second possible treatment approach in addition to/replacing post-infection immune response.

      - Effects are consistent across multiple species (rat, mouse, human) and methodological assays (GluSnFR AND current clamp recordings AND Ca imaging)

      - High K content (comparable levels to high-K seizure models) of larvae could have also caused depolarization. Adequate experiments to exclude K and other suspected larvae contents (i.e. Substance P).

      Weaknesses:

      - Acute study is limited to studying depolarization in slices and it is unclear what is necessary/sufficient for in vivo seizure generation or epileptogenesis for chronic epilepsy. - There is likely a significant role of the immune system that is not explored here. This issue is adequately addressed in the discussion, however, and the glutamate data is considered in this context.

      Discuss impact:

      - Interfering with peri-larval glutamate signaling may hold promise to prevent ictogenesis and chronic epileptogenesis as this is a very understudied cause of epilepsy with unknown mechanistic etiology.

      Additional context for interpreting significance:

      - High medical need as most common adult onset epilepsy in many parts of the world

      We thank Reviewer 1 for their positive and thorough assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments/analysis:

      -   Fig 4a-c: Larva on a slice and not next to it? Negative results maybe because its E/S products are just washed away (assuming submerged recording chamber/conditions)? Experiments and negative results described here do not seem conclusive. Should be discussed at least?

      We agree with the reviewer and have added the following sentence to the relevant section of the Results: ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      Writing & presentation:

      - Data is not always reported consistently in text and figures, examples:

      - Results in text are reported varyingly without explanation:

      - Mean and/or median? SEM or SD and/or IQR? Stat info included in text or not? i.e. lines 130/131 vs. 160/161

      Results and data are now presented in a more uniform fashion. We report medians and IQRs, sample size, statistical test result, statistical test used in that order.

      - Larval release data interrupts reading flow, lines 246-252 double up results presented in Fig 5F.

      This section has now been significantly abbreviated and reads as follows: ‘T. crassiceps larvae released a relatively constant median daily amount of glutamate, ranging from 41.59 – 60.15 ug/20 larvae, which showed no statistically significant difference across days one to six. Similarly, T. crassiceps larvae released a relatively constant median daily amount of aspartate, ranging from 9.431 – 14.18 ug/20 larvae, which showed no statistically significant difference across days one to six.’

      - Results in figures are reported in different styles:

      Results have now been made uniform, reporting medians and IQRs and: sample size, p test result, statistical test used, figure # reported in that order.

      - Fig 6: E/S glu concentration seems to be significantly higher in solium vs crassiceps (about 6fold higher in solium). Should be discussed at least.

      Given the small sample size from T. solium (see response below), we do not draw attention to this difference and instead simply make the point that T. solium larvae contain and release glutamate.

      - In this context - N=1 may be sufficient for proof of principle (release) but seems too small of a cohort to describe non-constant release of glu over days (Fig 6D). Is initial release on day 1, no release and recovery in the following days reproducible? Is very high glu content of E/S content (15-fold higher in comparison to solium homogenate AND 6-fold higher in comparison to crassiceps homogenate and E/S content). Not sure if Fig 6D is adding relevant information, especially since it is based on n = 1

      We agree that a N=1 is only sufficient for proof of principle. However it is worth noting that the measurements still reflect the cumulative release from 20 larvae. Nonetheless, the statement in text has been simplified to say: ‘These results demonstrate that T. solium larvae continually release glutamate and aspartate into their immediate surroundings.’ As this focusses on the point that the larvae release glutamate and aspartate continuously and that we can’t draw conclusions about the variability over days.

      Methods:

      - Human slices, mention cortex - what part, patient data would be interesting. I.e. etiology of epilepsy, epilepsy duration 

      In the Materials and Methods section “Brain slice preparation” we have now added a table with the requested information.

      - For Taenia solium: How were they acquired and used in these experiments?

      In the Materials and Methods section “Taenia maintenance and preparation of whole cyst homogenates and E/S products” we describe how Taenia solium larvae were acquired and used.

      - Was access resistance monitored? Add exclusion criteria for patch experiments

      Figure supplement tables containing the basic properties for each cell recording have been added for each figure and the following statements were added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (supplementary files 1, 2, 3, 4, 6).’ and ‘Cells were excluded from analyses if the Ra was greater than 80 Ω or if the resting membrane potential was above –40 mV.’  

      - Cannot see any reference to mouse slices in methods? Also, mouse organotypic cultures (for AAV?)? Or only acute slices from mice and organotypic hip cultures from rats? Seems to have been mouse and rat organotypic cultures? But not clear with further clarification in methods.

      We have now added the following clarification to the methods: ‘For experiments using calcium and glutamate imaging mouse hippocampal organotypic brain slices were used. For all other experiments rat hippocampal organotypic brain slices were used. A subset of experiments used acute human cortical brain slices and are specified.’

      - How long after the wash-in phase was the wash-out phase data collected?

      For wash-in recordings drugs were washed in for 8 mins before recordings were made. Drugs were washed out for at least 8 mins before wash-out recordings were made. This information has been added to the Materials and Methods section.

      - In general, the M&M section seems to have been written hastily - author's internal remarks "supplier?" are still present.

      The M&M section has been thoroughly proofread for errors and internal remarks removed or corrected.

      - A little more information on the clinical subjects would be appreciated. I.e. duration of epilepsy? Localization? What cortex? Usual temporal lobe or other regions?

      We have now added a table with this information to the Materials and Methods section “Brain slice preparation”.

      Minor corrections text/figures:

      - i.e. 3D,F,H,J show individual data points, thats great, but maybe add mean/median marker (as results are reported like this in text)  like in fig 4G,I and others

      Figures 3D,F,H & J have been revised to include median and IQR.

      - Only one patient mentioned in acknowledgements, but 2 in methods and text

      We apologize for this oversight and now acknowledge both patients in the acknowledgements.

      - Fig 1 B-F individual puffs are described as increasing - consistent with cellular effects (1st puff depolarizes, 2nd puff elicits 1 AP, 3rd puff elicits AP burst)  However, dilution ratio of homogenate or puff concentrations are not mentioned (or potentially longer than 20 ms puffs for 2nd and 3rd stimulus?) in text or figures. Seems to be enough space to indicate in figure as well (i.e. multiple or thicker arrows for subsequent puffs or label with homogenate dilution/concentration in figure).

      We state in the results section associated with Fig. 1 that increasing the amount of homogenate delivered was achieved by increasing the pressure applied to the ejection system. We now include this information in the figure legend.

      - Figure legend describes 30 ms puff for Ca imaging whereas ephys data (from text) is 20 ms puff. Was Ca imaging performed in acute mouse hippocampal slices (as figure text suggests) or were those organotypic hippocampal cultures from mice?

      Ca2+  imaging was performed in mouse hippocampal organotypic brain slice cultures. The figure text for Fig. 1 E) states “widefield fluorescence image of neurons in the dentate gyrus of a mouse hippocampal organotypic brain slice culture expressing the genetically encoded Ca2+ reporter GCAMP6s...”

      - 11.4 mM K is reported for homogenate in text only. How variable is that? How many n? No SD reported in text and no individual data points reported since this experiment is not represented as a figure.

      This has been clarified in the text by adding (N = 1, homogenate prepared from >100 larvae).

      - Same results (effect of 11.4 mM K on Vm) described twice in one paragraph, compare lines 126-131 with 131-136.

      The repetition has been removed.

      - Line 182 - example for consistency: decide IQR or SD/SEM

      To improve consistency, we have changed to median and IQR throughout.

      - Neuronal recordings are reported as hippocampal pyramidal neurons (i.e. line 222) but some recordings were made from dentate granule cells - please clarify which neurons were recorded in ephys, ca imaging, GluSnFr imaging

      For each experiment we describe which type of neurons were recorded from. For rodent recordings these were hippocampal pyramidal neurons except in the case of the Ca2+ imaging example where the widefield recording was over the dentate gyrus subfield.

      - Line 309: "should" seems to be an extra word

      We have removed the word ‘should’ and made the sentence shorter and clearer. It now reads: ‘Given our finding that cestode larvae contain and release significant quantities of glutamate, it is possible that homeostatic mechanisms for taking up and metabolizing glutamate fail to compensate for larvalderived glutamate in the extracellular space. Therefore, similar glutamate-dependent excitotoxic and epileptogenic processes that occur in stroke, traumatic brain injury and CNS tumors are likely to also occur in NCC.’

      Reviewer #2 (Public Review):

      Since neurocysticercosis is associated with epilepsy, the authors wish to establish how cestode larvae affect neurons. The underlying hypothesis is that the larvae may directly excite neurons and thus favor seizure genesis.

      To test this hypothesis, the authors collected biological materials from larvae (from either homogenates or excretory/secretory products), and applied them to hippocampal neurons (rats and mice) and human cortical neurons.

      This constitutes a major strength of the paper, providing a direct reading of larvae's biological effects. Another strength is the combination of methods, including patch clamp, Ca, and glutamate imaging.

      We thank the Reviewer 2 for their review of the strength and weaknesses of our manuscript. We respond to the identified weaknesses below.

      There are some weaknesses:

      (1) The main one relates to the statement: "Together, these results indicate that T. crassiceps larvae homogenate results not just in a transient depolarization of cells in the immediate vicinity of application, but can also trigger a wave of excitation that propagates through the brain slice in both space and time. This demonstrates that T. crassiceps homogenate can initiate seizurelike activity under suitable conditions."

      The only "evidence" of propagation is an image at two time points. It is one experiment, and there is no quantification. Either increase n's and perform a quantification, or remove such a statement.

      We acknowledge that the data is from one experiment, with the intention of demonstrating that it is plausible for intense depolarization of a subset of neurons to result in the initiation and propagation of seizure-like activity to nearby neurons under suitable conditions. However, we agree that it is prudent to remove this statement and have done so.

      Likewise, there is no evidence of seizure genesis. A single cell recording is shown. The presence of a seizure-like event should be evaluated with field recordings.

      In this experiment the Ca2+ imaging demonstrates activity spreading from the site of the restricted homogenate puff to all surrounding neurons. Furthermore, the whole-cell recoding is typical of a slice wide seizure-like event.  

      (2) Control puff experiments are lacking for Fig 1. Would puffing ACSF also produce a depolarization, and even firing, as suggested in Fig. 2D? This is needed for at least one species.

      We agree and have added this data for the rat and mouse neuron in a new Figure 1-figure supplement 1.

      (3) What is the rationale to use a Cs-based solution? Even in the presence of TTX and with blocking K channels, the depolarization may be sufficient to activate Ca channels (LVGs), which would further contribute to the depolarization. Why not perform voltage clamp recordings to directly the current?

      The intention of the Cs-based solution was to block K+ channels and reduce the effect of moderately raised K+ in the homogenate to isolate the contribution of other causative agents of depolarization (i.e. glutamate / aspartate). We agree that performing voltage clamp recordings would have been useful for directly recording the currents responsible for depolarization. 

      (4) Why did you use organotypic slices? Since you wish to model adult epilepsy, it would have been more relevant to use fresh slices from adult rats/mice. At least, discuss the caveat of using a network still in development in vitro.

      Recordings were performed 6–14 days post culture, which is equivalent to postnatal Days (P) 12 to 22. Previous work has shown that neurons in the organotypic hippocampal brain slice are relatively mature (Gähwiler et al., 1997). For example they possess mature Cl- homeostasis mechanisms at this point, as evidenced by their hyperpolarizing EGABA (Raimondo et al., 2012).  

      (5) Please include both the number of slices and number of cells recorded in each condition. This is the standard (the number of cells is not enough).

      This has now been added to all relevant sections of the results text.  

      (6) Please provide a table with the basic properties of cells (Rin, Rs, etc.). This is standard to assess the quality of the recordings.

      Tables containing the basic properties for each cell recording have been created for each figure (as Figure supplements) and the following statement was added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (see Figure supplements).’

      (7) Please provide a table on patient's profile. This is standard when using human material. Were these TLE cases (and "control" cortex) or epileptogenic cortex?

      We have now added a basic table on the patient’s profiles to the Materials and Methods section.

      Globally, the authors achieved their aims. They show convincingly that larvae material can depolarize neurons, with glutamate (and aspartate) as the most likely candidates.

      This is important not only because it provides mechanistic insight but also potential therapeutic targets. The result is impactful, as the authors use quasi-naturalistic conditions, to assess what might happen in the human brain. The experimental design is appropriate to address the question. It can be replicated by any interested person.

      We thank the Reviewer 2 for their enthusiastic and constructive assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #2 (Recommendations For The Authors):

      lines 132 and following are a repetition of those above

      These have been removed.

      line 151 Fig "2" missing

      This has been added.

      187, 190 should be E, F not C, D

      This has been changed in the text.  

      481, 482 supplier?

      This has been corrected and the correct suppliers described.

      Reviewer #3 (Public Review):

      This paper has high significance because it addresses a prevalent parasitic infection of the nervous system, Neurocysticercosis (NCC). The infection is caused by larvae of the parasitic cestode Taenia solium It is a leading cause of epilepsy in adults worldwide

      To address the effects of cestode larvae, homogenates and excretory/secretory products of larvae were added to organotypic brain slice cultures of rodents or layer 2/3 of human cortical brain slices from patients with refractory epilepsy.

      We thank Reviewer 3 for their helpful comments and suggestions for improvement which we address below.

      A self-made pressure ejection system was used to puff larvae homogenate (20 ms puff) onto the soma of patched neurons. The mechanical force could have caused depolarizaton so a vehicle control is critical. On line 150 they appear to have used saline in this regard, and clarification would be good. Were the controls here (and aCSF elsewhere) done with the low Mg2+o aCSF like the larvae homogenates?

      We agree and have added examples where aCSF alone was pressure ejected onto the same rat and mouse neurons in a new Figure 1-figure supplement 1. In Figure 1, the same aCSF as that was used to bathe the slices was used. In Figure 2D-G, either PBS (which larval homogenates were prepared in) or growth medium (which contain larval E/S products) were used as comparative controls.

      They found that neurons depolarized after larvae homogenate exposure and the effect was mediated by glutamate but not nicotinic receptors for acetylcholine (nAChRs), acid-sensing channels or substance P. To address nAChRs, they used 10uM mecamyline, and for ASICs 2mM amiloride which seems like a high concentration. Could the concentrations be confirmed for their selectivity? 

      We did not independently verify the selectivity of the antagonist concentrations used in our study. However, the persistence of depolarizations despite the use of high concentrations of mecamylamine (10 μM) and amiloride (2 mM) provides strong evidence that neither nAChRs nor ASICs are primarily responsible for mediating these responses. The high concentrations used, while potentially raising concerns about specificity, actually strengthen our conclusion that these receptor types are not involved in the observed effect.

      Glutamate receptor antagonists, used in combination, were 10uM CNQX, 50uM DAP5, and 2mM kynurenic acid. These concentrations are twice what most use. Please discuss. 

      We intentionally used higher-than-typical concentrations of glutamate receptor antagonists in our experimental design. Our rationale for this approach was to ensure maximal blockade of glutamate receptors, thereby minimizing the possibility of residual receptor activity confounding our results.

      Also, it would be very interesting to know if the glutamate receptor is AMPA, Kainic acid, or NMDA. Were metabotropic antagonists ever tested? That would be logical because CNQX/DAPR/Kynurenic acid did not block all of the depolarization.

      We appreciate the reviewer's interest in the specific glutamate receptor subtypes involved in our study. Our research primarily focused on ionotropic glutamate receptors as a group, without differentiating the individual contributions of AMPA, Kainate, and NMDA receptors. This approach, while broad, allowed us to establish the involvement of glutamatergic signalling in the observed effects. We acknowledge that we did not investigate metabotropic glutamate receptors in this study. Importantly, we demonstrate later in our manuscript that the larval products contain both glutamate and aspartate. Therefore the precise nature of the glutamate-dependent depolarization observed using a particular experimental preparation would depend on the specific types of neurons exposed to the homogenate and the expression profile of different glutamate receptor subtypes on these neurons.

      They also showed the elevated K+ in the homogenate (~11 mM) could not account for the depolarization. However, the experiment with K+ was not done in a low Mg2+o buffer (Or was it -please clarify). 

      The experiment where 11.39 mM K+ as well as the experiment with T. crass. Homogenate with a cesium internal and added TTX were all done in standard 2 mM Mg2+ containing aCSF.

      They also confirmed that only small molecules led to the depolarization after filtering out very large molecules. That supports the conclusion that glutamate - which is quite small - could be responsible. It is logical to test substance P because the Intro points out prior work links the larvae and seizures by inflammation and implicates substance P. However, why focus on nAChRs and ASIC?

      These were chosen as they are ionotropic receptors which mediate depolarization and hence could conceivably be responsible for the homogenate-induced depolarization we observed.

      The depolarizations caused seizure-like events in slices. The slices were exposed to a proconvulant buffer though- low Mg2+o. This buffer can cause spontaneous seizure-like events so it is important to know what the buffer did alone.

      We agree that a low M2+ buffer solution can elicit seizure-like events in organotypic slices alone. However, the timing of the onset of the seizure-like event in the example presented in Figure 1 strongly suggests that it was triggered by the T. crass homogenate puff. Nonetheless, on the suggestion of the other reviewers we have reduced emphasis on our experimental evidence for the ability of T. crass. homogenate to illicit seizure-like events.  

      They suggest the effects could underlie seizure generation in NCC. However, there is only one event that is seizure-like in the paper and it is just an inset. Were others similar? How frequency were they? How long?

      Please see the response above as well as our response to Reviewer 1 who raised a similar concern.

      Using Glutamate-sensing fluorescent reporters they found the larvae contain glutamate and can release it, a strength of the paper.

      Fig. 4. Could an inset be added to show the effects are very fast? That would support an effect of glutamate.

      We have not added an inset. However, given the scale bar (500 ms) for the trace provided, the response is very fast.  

      Why is aspartate relatively weak and glutamate relatively effective as an agonist?

      Glutamate generally has a higher affinity for glutamate receptors compared to aspartate. This is particularly true for AMPA and kainate receptors, where glutamate is the primary endogenous agonist. Similarly iGluSnFR has a higher sensitivity for glutamate over aspartate (Marvin et al., 2013).

      Could some of the variability in Fig 4G be due to choice of different cell types? That would be consistent with Fig 5B where only a fraction of cells in the culture showed a response to the larvae nearby. 

      Whilst differences in cell types could contribute to the variability in Fig 4G, all the responses were recorded from hippocampal pyramidal neurons and hence it is more likely that the variability is a function of other sources of variation including differences in iGluSnFR expression, depth of the cell imaged, the proximity of the puffer pipette etc. In Fig. 5B we think the lack of response may be due to the fact that any released glutamate by the live larvae was not able reach the iGluSnFR neurons at sufficient concentrations due to the nature of our submerged recording setup. We have added the following sentence to the results. ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      On what basis was the ROI drawn in Fig. 5B.

      The ROI drawn in Fig. 5B was selected to include all iGluSnFR expressing neurons in the brain slice. which were captured in the field of view.

      Also in 5B, I don't see anything in the transmitted image. What should be seen exactly?

      We agree that it is difficult to resolve much in the transmitted image. However, both the brain slice on the left as well as a T. crass. larva on the right is visible and outlined with a green or orange dashed line respectively.

      Human brain slices were from temporal cortex of patients with refractory epilepsy. Was the temporal cortex devoid of pathology and EEG abnormalities? This area may be quite involved in the epilepsy because refractory epilepsy that goes to surgery is often temporal lobe epilepsy. Please discuss the limitations of studying the temporal cortex of humans with epilepsy since it may be more susceptible to depolarizations of many kinds, not just larvae.

      We acknowledge the important limitations of using temporal cortex tissue from patients with refractory epilepsy. While we aimed to use visually normal tissue, we recognize that the tissue may have underlying pathology or functional abnormalities not visible to the naked eye. It may also be more susceptible to induced depolarizations due to epilepsy-related changes in neuronal excitability. Despite these limitations, we believe our human tissue data still provides valuable data that the larval homogenates can induce depolarization in human as well as rodent neurons.  

      Please discuss the limitations of the cultures - they are from very young animals and cultured for 6-14 days.

      We acknowledge the potential limitations of our experimental model using organotypic hippocampal slice cultures from young animals. The use of relatively immature tissue may not fully represent the adult nervous system due to developmental differences in receptor expression, synaptic connections, and network properties. The 6-14 day culture period, while allowing some maturation, may induce changes that differ from the in vivo environment, including alterations in cellular physiology and network reorganization. Despite these limitations, this model provides a valuable balance between preserved local circuitry and experimental accessibility. Future studies comparing results with acute adult slices and in vivo models would be beneficial to validate and extend our findings.

      References:

      Gähwiler, B.H. et al. (1997) ‘Organotypic slice cultures: a technique has come of age.’, Trends in neurosciences, 20(10), pp. 471–7.

      Marvin, J.S. et al. (2013) ‘An optimized fluorescent probe for visualizing glutamate neurotransmission.’, Nature methods, 10(2), pp. 162–70. Available at: https://doi.org/10.1038/nmeth.2333.

      Raimondo, J.V. et al. (2012) ‘Optogenetic silencing strategies differ in their effects on inhibitory synaptic transmission.’, Nat. Neurosci., 15(8), pp. 1102–4. Available at: https://doi.org/10.1038/nn.3143.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements. But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one). You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      We thank the Reviewer for their careful reading of manuscript and constructive suggestions. We plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      We thank the Reviewer for their constructive feedback on our work. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci. Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We thank the Reviewer for providing detailed critiques of our manuscript. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a novel contribution that significantly advances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides insights into the intermediate steps of CMG formation. The study builds upon previously known structures of Sld3 and Cdc45 and offers new perspectives into how Cdc45 is loaded onto MCM DH through Sld3-Sld7. The most notable finding is the structural difference in Sld3CBD when bound to Cdc45, particularly the arrangement of the α8-helix, which is essential for Cdc45 binding and may also pertain to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a possible mechanism for its binding to MCM2NTD.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of experimental validation for the proposed Sld3-Sld7-Cdc45 model. Specifically, the claim that Sld3 binding to Cdc45-MCM does not inhibit GINS binding, a finding that contradicts previous research, is not sufficiently substantiated with experimental evidence. To strengthen their model, the authors must provide additional experimental data to support this mechanism. Also, the authors have not compared the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to see if Sld3/Treslin does not cause any clash with the GINS when bound to the CMG. Still, the work holds great potential in its current form but requires further experiments to confirm the authors' conclusions.

      We appreciate the reviewers’ careful reading and the comments.

      The structure of Sld3CBD-Cdc45 showed that the binding site of Cdc45 to Sld3CBD was distinct from the binding ranges of Cdc45 to GINS and MCM, indicating that the Sld3CBD, MCM, and GINS bind to separate sites of Cdc45 on the CMG complex. The SCMG-DNA model confirmed such a binding situation but did not show whether the binding of Sld3 to Cdc45 affects the recruitment of GINS (by GINS-Dbp11-Sld2) for CMG formation. We will modify our manuscript and discuss this point. Also, we will check the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to confirm our conclusions. We will try to conduct the experiments as suggested.

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. However, the work remains incomplete as several main claims are only partially supported by experimental data, particularly the proposed model for Sld3 interaction with GINS on the CMG. Additionally, the single-stranded DNA binding data from different species do not convincingly advance the manuscript's central arguments.

      Strengths

      (1) The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      (2) The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      (3) The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      (4) The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      (5) The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      Weaknesses

      (1) The proposed model for Sld3 interacting with GINS on the CMG needs more experimental validation and conflicts with published findings. These discrepancies need more detailed discussion and exploration.

      (2) The section on the binding of Sld3 complexes to origin single-stranded DNA needs significant improvement. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      (3) The authors' model proposing the release of Sld3 from CMG based on its binding to single-stranded DNA is unclear and needs more elaboration.

      We appreciate your positive comments. As suggested, we will try to improve the experiments and manuscript and discuss in more detail, including the interaction between Sld3 and GINS on the CMG, ssDNA-binding section, and the explanations of why we use different species for comparison and more elaboration on the Sld3-release proposal.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion.

      We thank your positive assessment. We will provide more quantitative information and try to quantify the experiments as suggested.

    1. Author response:

      Reviewer 1:

      (1) I think the article is a little too immature in its current form. I'd recommend that the authors work on their writing. For example, the objectives of the article are not completely clear to me after reading the manuscript, composed of parts where the authors seem to focus on SGCs, and others where they study "engram" neurons without differentiating the neuronal type (Figure 5). The next version of the manuscript should clearly establish the objectives and sub-aims.

      Our overarching focus was to identify whether intrinsic physiology and circuit connectivity of SGCs contribute to their unique overrepresentation in neurons labeled as part of a behaviorally relevant dentate engram. Since our systematic analysis of “engram SGCs” did not support the proposal that engram SGCs drive robust feedforward excitation of engram GCs or feedback inhibition of non-engram GCs, we examined an alternative hypothesis that inputs drive recruitment of neurons, regardless of subtype (in figure 5). These are sparsely labeled neurons, with mixed populations of GCs and SGCs undergoing paired recordings. Since the focus of the experiment was input correlation between two simultaneously recorded neurons, we did not report the individual cell types. We regret that this caused confusion and will clarify this issue in the revised manuscript.

      (2) In addition, some results are not entirely novel (e.g., the disproportionate recruitment as well as the distinctive physiological properties of SGCs), and/or based on correlations that do not fully support the conclusions of the article. In addition to re-writing, I believe that the article would benefit from being enriched with further analyses or even additional experiments before being resubmitted in a more definitive form.

      We would like to note that while we and others have previously reported the distinctive SGC physiology, this study is the first to compare physiological properties of SGCs labeled as part of an engram to unlabeled SGCs. That was the thrust of the data presented which may have been missed and will be emphasized in the revision. Similarly, while others have shown higher SGC recruitment in dentate engrams, we had to validate this in the dentate dependent behaviors that we adopted in this study. We also note that the proportional SGC recruitment in our study, based on morphometric classification, differs from what was reported previously. These aspects of study, which were considered confirmatory, represent the necessary validation needed to proceed with the novel cell-type specific paired recordings and optogenetic analyses of engram neurons presented in subsequent sections of the manuscript. We will emphasize these considerations in the revised manuscript.

      Reviewer 2:

      (1) The authors conclude that SGCs are disproportionately recruited into cfos assemblies during the enriched environment and Barnes maze task given that their classifier identifies about 30% of labelled cells as SGCs in both cases and that another study using a different method (Save et al., 2019) identified less than 5% of an unbiased sample of granule cells as SGCs. To make matters worse, the classifier deployed here was itself established on a biased sample of GCs patched in the molecular layer and granule cell layer, respectively, at even numbers (Gupta et al., 2020). The first thing the authors would need to show to make the claim that SGCs are disproportionately recruited into memory ensembles is that the fraction of GCs identified as SGCs with their own classifier is significantly lower than 30% using their own method on a random sample of GCs (e.g. through sparse viral labelling). As the authors correctly state in their discussion, morphological samples from patch-clamp studies are problematic for this purpose because of inherent technical issues (i.e. easier access to scattered GCs in the molecular layer).

      We regret that there seems to be some confusion about use of a classifier. We did NOT use any automated classifier in this study. All cell type classifications in the study were conducted by experienced investigators examining cell morphology and classifying cells based on established morphometric criteria. In our prior study (Gupta et al., 2020) we had conducted an automated cluster analysis that was able to classify GCs and SGCs as different cell types. The principal components underlying the automated clustering in Gupta et al 2020 were consistent with the major criteria identified in prior morphology-based analyses by us and others (including Williams et al 2010 and Save et al., 2019). To date, in the absence of a validated molecular marker, morphometry from recorded and filled cells or sparsely labeled neurons is the only established method to classify SGCs. This was the approach we adopted, and this will be further clarified in the revisions.

      (2) The authors claim that recurrent excitation from SGCs onto GCs or other SGCs is irrelevant because they did not find any connections in 32 simultaneous recordings (plus 63 in the next experiment). Without a demonstration that other connections from SGCs (e.g. onto mossy cells or interneurons) are preserved in their preparation and if so at what rates, it is unclear whether this experiment is indicative of the underlying biology or the quality of the preparation. The argument that spontaneous EPSCs are observed is not very convincing as these could equally well arise from severed axons (in fact we would expect that the vast majority of inputs are not from local excitatory cells). The argument on line 418 that SGCs have compact axons isn't particularly convincing either given that the morphologies from which they were derived were also obtained in slice preparations and would be subject to the same likelihood of severing the axon. Finally, even in paired slice recordings from CA3 pyramidal cells the experimentally detected connectivity rates are only around 1% (Guzman et al., 2016). The authors would need to record from a lot more than 32 pairs (and show convincing positive controls regarding other connections) to make the claim that connectivity is too low to be relevant.

      As noted in our discussion, we are fully cognizant that potential SGC to GC connections may have been missed by the nature of slice physiology experiments and made every effort to limit this possibility. As noted in the manuscript, we only analyzed GC/SGC pairs where hilar axon collaterals of the neurons were recovered. We do not claim that SGC to GC/SGC connections are irrelevant, rather, we indicate that these connections, if present, are sparse and unlikely to drive engram refinement. Interestingly, wide field optical stimulation, designed to activate multiple labeled engram neurons and axon terminals including those of SGCs whose somata were outside the slice, did not lead to EPSCs in other unlabeled GCs or SGCs suggesting the lack of robust SGC to GC/SGC synaptic connectivity. While we have previously published paired recordings from interneurons to GCs (Proddutur  et al 2023) , we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses would serve as an added control in the revised manuscript.

      (3) Another troubling sign is the fact that optogenetic GC stimulation rarely ever evokes feedback inhibition onto other cells which contrasts with both other in vitro (e.g. Braganza et al., 2020) and in vivo studies (Stefanelli et al., 2016) studies. Without a convincing demonstration that monosynaptic connections between SGCs/GCs and interneurons in both directions is preserved at least at the rates previously described in other slice studies (e.g. Geiger et al., 1997, Neuron, Hainmueller et al., 2014, PNAS, Savanthrapadian et al., 2014, J. Neurosci), the notion that this setting could be closer to naturalistic memory processing than the in vivo experiments in Stefanelli et al. (e.g. lines 443-444) strikes me as odd. In any case, the discussion should clearly state that compromised connectivity in the slice preparation is likely a significant confound when comparing these results.

      We would like to note that our data are consistent with Braganza 2020 study, as we explain below. Moreover, we would like to point out that the demonstration of “feedback inhibition” in the Stefanelli study was NOT in engram or behaviorally labeled neurons nor was it in vivo. As we explain below, the physiological assay in Stefanelli was in slices and in a cohort of GCs with virally driven ChR2 expression. Thus, we are fully confident that our experimental paradigm better reflects a behavioral engram. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation. We also submit that we already discuss the potential concerns regarding compromised connectivity in slice preparations.

      Regarding the lack of optically evoked feedback inhibition, we would like to point out that the Braganza 2020 study examined focal optogenetic activation of GCs, where a high density of GCs was labeled using a Prox-cre line. They reported that about 2-4% of these densely labeled cells need to be recruited to evoke feedback IPSCs. Our experimental condition, where ChR2 was expressed in behaviorally labeled neurons, leads to sparse labeling much less than the focal 4% needed to evoke IPSCs in the Braganza study. We do not claim that feedback inhibition cannot be activated by focal activation of a cohort of GCs and even show an example of paired recording with feedback GC inhibition of an SGC. Our conclusion is that the few sparsely labeled neurons during a behavioral episode do not support robust feedback inhibition proposed to mediate engram refinement. We submit that our findings are fully consistent with the sparse GC driven feedback inhibition, and the need to activate a cohort of focal GCs to recruit feedback inhibition, reported in Braganza 2020

      Regarding the Stefanelli study, we maintain that our behaviorally relevant in vivo labeling approach is more naturalistic than the DREADD and Channelrhodopsin driven artificial “engrams” generated in the Stefanelli study. Of note, we used cFOS driven TRAP mice to label, in vivo, neurons active during a behavior and then undertook slice physiology studies in these mice a week later. In contrast, the slice physiology data demonstrating putative feedback inhibition in the Stefanelli study (Fig 5) used wildtype mice injected with AAV CAMKII-cre and AAV-DIO-ChR2. Thus, unlike our study, the physiological data demonstrating feedback inhibition in the Stefanelli study was not performed in a behaviorally labeled engram. Apart from the one set of histological experiments using AAV-SARE-GFP to demonstrate increased GFP labeling of SST neurons in behavior, all other data presented in the Stefanelli study are generated based on artificially generated engrams where optogenetic activation or silencing on granule cells was used to manipulate the numbers of neurons active during a task followed by histological analysis of cFOS staining or behaviors. Thus, the physiological experiments in the Stefanelli et al (2016) generated by wide field activation of a large cohort of GCs labeled by focal virally driven ChR2 expression, were similar to wide field optical stimulation studies in the Braganza 2020 study, and were NOT conducted in a behavioral engram. The strength of our study is in the use of a behaviorally tagged engram neurons for analysis and our findings in sparsely labeled neurons are consistent with the reports in Braganza 2020. We will further clarify in our discussion that the data presented in the Stefanelli study do NOT represent a natural behavior generated engram.

      (4) Probably the most convincing finding in this study is the higher zero-time lag correlation of spontaneous EPSCs in labelled vs. unlabeled pairs. Unfortunately, the fact that the authors use spontaneous EPSCs to begin with, which likely represent a mixture of spontaneous release from severed axons, minis, and coordinated discharge from intact axon segments or entire neurons, makes it very hard to determine the meaning and relevance of this finding. At the bare minimum, the authors need to show if and how strongly differences in baseline spontaneous EPSC rates between different cells and slices are contributing to this phenomenon. I would encourage the authors to use low-intensity extracellular stimulation at multiple foci to determine whether labelled pairs really share higher numbers of input from common presynaptic axons or cells compared to unlabeled pairs as they claim. I would also suggest the authors use conventional Cross correlograms (CCG; see e.g. English et al., 2017, Neuron; Senzai and Buzsaki, 2017, Neuron) instead of their somewhat convoluted interval-selective correlation analysis to illustrate co-dependencies between the event time series. The references above also illustrate a more robust approach to determining whether peaks in the CCGs exceed chance levels.

      We appreciate the comment can provide additional data on the EPSC frequency in individual labeled and unlabeled cells in the revised manuscript. As indicated in the manuscript, we constrained our analysis to cell pairs with comparable EPSC frequency in order to avoid additional confounds in analysis. We have additional experiments to show that over 50% of the sEPSCs represent action potential driven events which we will include in the revised manuscript. We thank the reviewer for the suggestion to explores alternative methods of analyses including CCGs to further strengthen our findings.

      (5) Finally, one of the biggest caveats of the study is that the ensemble is labelled a full week before the slice experiment and thereby represents a latent state of a memory rather than encoding consolidation, or recall processes. The authors acknowledge that in the discussion but they should also be mindful of this when discussing other (especially in vivo) studies and comparing their results to these. For instance, Pignatelli et al 2018 show drastic changes in GC engram activity and features driven by behavioral memory recall, so the results of the current study may be very different if slices were cut immediately after memory acquisition (if that was possible with a different labelling strategy), or if animals were re-exposed to the enriched environment right before sacrificing the animal.

      As noted by the reviewer, we fully acknowledge and are cognizant of the concern that slices prepared a week after labeling may not reflect ongoing encoding. Although our data show that labeled cells are reactivated in higher proportion during recall, we have discussed this caveat and will include alternative experimental strategies in the discussion.

      Reviewer 3:

      (1) Engram cells are (i) activated by a learning experience, (ii) physically or chemically modified by the learning experience, and (iii) reactivated by subsequent presentation of the stimuli present at the learning experience (or some portion thereof), resulting in memory retrieval. The authors show that exposure to Barnes Maze and the enriched environment-activated semilunar granule cells and granule cells preferentially in the superior blade of the dentate gyrus, and a significant fraction were reactivated on re-exposure. However, physical or chemical modification by experience was not tested. Experience modifies engram cells, and a common modification is the Hebbian, i.e., potentiation of excitatory synapses. The authors recorded EPSCs from labeled and unlabeled GCs and SGCs. Was there a difference in the amplitude or frequency of EPSCs recorded from labeled and unlabeled cells?

      We agree that we did not examine the physical or chemical modifications by experience. Although we constrained our sEPSC analysis to cell pairs with comparable sEPSC frequency, we will include data on sEPSC parameters in labeled and unlabeled cells in the revised manuscript.

      (2) The authors studied five sequential sections, each 250 μm apart across the septotemporal axis, which were immunostained for c-Fos and analyzed for quantification. Is this an adequate sample? Also, it would help to report the dorso-ventral gradient since more engram cells are in the dorsal hippocampus. Slices shown in the figures appear to be from the dorsal hippocampus.

      We thank the reviewer for the comment. We analyzed sections along the dorso-ventral gradient. As explained in the methods, there is considerable animal to animal variability in the number of labeled cells which was why we had to use matched littermate pairs in our experiments This variability could render it difficult to tease apart dorsoventral differences.

      (3) The authors investigated the role of surround inhibition in establishing memory engram SGCs and GCs. Surprisingly, they found no evidence of lateral inhibition in the slice preparation. Interneurons, e.g., PV interneurons, have large axonal arbors that may be cut during slicing. Similarly, the authors point out that some excitatory connections may be lost in slices. This is a limitation of slice electrophysiology.

      We agree that slice physiology has limitations and discuss this caveat. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study by Chikermane and colleagues investigates the functional, structural, and dopaminergic network substrates of cortical beta oscillations (13-30 Hz). The major strength of the work lies in the methodology taken by the authors, namely a multimodal lesion network mapping. First, using invasive electrophysiological recordings from healthy cortical territories of epileptic patients they identify regions with the highest beta power. Next, they leverage open-access MRI data and PET atlases and use the identified high-beta regions as seeds to find (1) the whole-brain functional and structural maps of regions that form the putative underlying network of high-beta regions and (2) the spatial distribution of dopaminergic receptors that show correlation with nodal connectivity of the identified networks. These steps are achieved by generating aggregate functional, structural, and dopaminergic network maps using lead-DBS toolbox, and by contrasting the results with those obtained from high-alpha regions.

      The main findings are:

      (1) Beta power is strongest across frontal, cingulate, and insular regions in invasive electrophysiological data, and these regions map onto a shared functional and structural network. (2) The shared functional and structural networks show significant positive correlations with dopamine receptors across the cortex and basal ganglia (which is not the case for alpha, where correlations are found with GABA).

      Nevertheless, a few clarifications regarding the choice of high-power electrodes and distributions of functional connectivity maps (i.e., strength and sign across cortex and sub-cortex) can help with understanding the results.

      We thank the reviewer for this critical expert assessment. 

      Reviewer #1 (Recommendations For The Authors):

      To potentially enhance the quality of the manuscript in the current version, I kindly ask the authors to address the following points:

      Major:

      (A) Power analysis of electrophysiological data

      (1) How were significant peaks identified exactly? I understand that the authors used FOOOF methodology to estimate periodic components of brain activity.

      Thank you for pointing us to this lack of clarity. The application of FOOOF consists of the fitting of a one-over-f curve that delineates the aperiodic component followed by the definition of gaussians to fit periodic activity. This allows for extraction of periodic peak power estimates that are corrected for offset and exponent of the one-over-f or non-oscillatory aperiodic component in the spectrum (further information can be found here https://fooof-tools.github.io/fooof/auto_tutorials/plot_02-FOOOF.html). We included all peaks that could be fitted using the process.

      How about aperiodic components (Figure 1, PSD plots)? 

      We share the interest in aperiodic activity with the reviewer. However, given that the primary aim of this study was the description of beta oscillations and the methodology and results presentation is already very complex, we did not include the analysis of aperiodic activity in this manuscript. This could be done in the future and it would surely be interesting to visualize the whole brain connectomic fingerprints of aperiodic exponent and offset. With regard to the purely anatomical description of nonoscillatory aperiodic activity we would like to refer to Figure 8 in Frauscher et al. Brain 2018 (https://doi.org/10.1093/brain/awy035) where this is described. We have decided not to include additional information on this matter, because a) we felt that this would further convolute the results and discussion without directly addressing any of the hypotheses and aims that we set out to tackle and b) the interpretation of aperiodic activity is still a matter of intense research with conflicting results, which warrants very careful considerations of many aspects that again would go beyond the scope of this paper. 

      In addition, to what degree would the results change if one identified the peaks relative to sites with no peak, similar to Frauscher et al. 

      Beta activity, the oscillation of interest in our analysis is ubiquitous in the brain. In fact, of 1772 channels, only 21 channels did not exhibit a beta peak detectable with FOOOF. Thus, a comparison of 1751 against 21 would not yield meaningful results. We have therefore decided to focus on the channels in which beta activity is the strongest and dominant observable oscillation. 

      If the FOOOF approach has some advantages, these should be pointed out or discussed.

      FOOOF indeed has the advantage that it provides an objective and reproducible estimation of peak oscillatory activity that accounts for differences in aperiodic activity. To the best of our knowledge, there is no other approach that is nearly as well documented, validated and computationally reproducible. 

      Changes in manuscript: We have now further clarified the definition of peak amplitudes in the results and methods section and have discussed the use of alternative measures in the limitations section of our manuscript.

      Results: “The frequency band with the highest peak amplitude was identified using the extracted peak parameter (pw) for each channel and depicted as the dominant rhythm for the respective localisation (Figure 1).”

      Methods: “Peak height was extracted using the pw parameter, which depicts peak amplitude after subtraction of any aperiodic activity.”

      Discussion: “Alternative approaches could yield different results, e.g. reusing channels for each peak that is observable and contrasting them to channels where such peak was not present. However, in our study the majority of channels exhibited beta activity, even if peaks were of low amplitude, which we believe would have led to less interpretable results.”

      (2) How exactly do the authors deal with channels with more than one peak? Some elaboration on this and how this could potentially impact the results would be appreciated. Sorry if I have missed it.

      Indeed, a description of this was lacking so we are very thankful that the reviewer pointed this out. The maximum peak amplitude method was a winner-takes-all approach where in the case of multiple peaks, the peak with the higher amplitude was chosen. This method of course has drawbacks in the form of lost or disregarded peaks and remains a limitation to this study. 

      Changes in manuscript: We have now clarified this in the methods and results sections, which now read: 

      Methods: “In case of multiple peaks within the same region, we used only the highest peak amplitude.”

      Results: “In case of multiple peaks within the same frequency band, we focused the analysis on the peak with the highest amplitude.”

      And added the following to the Limitations section of the discussion: 

      “Another limitation in our study is the fact that the statistical approach for the comparison of beta and alpha networks and even for multiple peaks within the same frequency band follows a winner takes all logic that is, by definition, a simplification, as most areas will contribute to more than one spatiospectrally distinct oscillatory network. Specifically, while multiple peaks within or across frequency bands could be present in each channel, we decided to allocate this channel to only the frequency band containing the highest peak amplitude.” 

      (B) Network mapping

      (1) Knowing that fMRI data are preprocessed by regressing the global signal, there are negative correlations across the functional networks. Unfortunately, the distribution, sign, and strength of the correlations are not quantitatively shown in any of the plots. Thus, it is unclear whether, e.g., corticocortical vs. subcortico-cortical correlations differ in strength and/or sign. I think this additional information is important for better understanding the up/down-regulation of beta, e.g., by DA signaling. Some discussion around this point in addition would be insightful, I think.

      The referee is touching upon a very important and difficult point, which we have considered very carefully. Global signal regression is a controversial topic and the neurophysiological basis of negative correlations remains to be elucidated. We can justify our use of this approach based on an expert consensus described in Murphy & Fox 2017 (https://doi.org/10.1016%2Fj.neuroimage.2016.11.052), which highlights that global signal regression can improve the specificity of positive correlations, improve the correspondence to anatomical connectivity. The truth however is that, we relied on it, because it is the more commonly used and validated approach used in lesion network and DBS connectivity mapping and implemented in the Lead Mapper pipeline. Indeed all connectivity estimates are shown in Supplementary figure 3. We remain hesitant to raise the focus to these points, because of the uncertain underlying neural correlates. However, when looking at the values, it is interesting to note that most key regions of interest exhibit positive connectivity values. 

      Changes in manuscript: We now point to the supplement containing all connectivity values in the results section more prominently: “All connectivity values including their sign are shown in figures as brain region averages parcellated with the automatic anatomical labelling atlas in supplementary figures 2&3.”

      (2) I assume no thresholding is applied to the functional connectivity maps (in a graph-theoretical sense). Please clarify (this is also related to the comment above, in particular, the strength of correlations.

      Indeed, we demonstrate SPM maps using family wise error corrected stats in figure 2, but all further analyses were performed on unthresholded maps as correctly pointed out by the referee. 

      Changes in manuscript: 

      Results: “Specifically, we analysed to what degree the spatial uptake patterns of dopamine, as measurable with fluorodopa (FDOPA; cohort average of 12 healthy subjects) and other dopamine signalling related tracers that bind D1/D2 receptors (average of N=17/44 respectively healthy subjects) or the dopamine transporter (DAT; cohort average of N=180 healthy subjects) were correlated with the unthresholded MRI connectivity maps.”

      Methods: “This parcellation was applied to both PET and unthresholded structural and functional connectivity maps using SPM and custom code.”

      Minor

      (1) Methods, Connectivity analysis: The description of (mass-univariate) GLM analysis is confusing. The maps underwent preprocessing? Which preprocessing steps are meant here? What is the dependent variable and what are the predictors exactly?

      We thank the reviewer for catching this error in our methods. We apologise for the confusion and mistake and thank the reviewer for catching it. Indeed, we have used t-tests without further preprocessing instead of a GLM. 

      Changes in manuscript: The respective section has been removed from the methods section and intermediate steps have been clarified. The section now reads: “To investigate differences between beta dominant and alpha dominant functional connectivity networks, a two sample t-test was calculated for the condition where beta was greater than alpha and vice versa using SPM. Here, the connectivity maps from each dominant channel (1005 beta functional connectivity maps and 397 alpha connectivity maps) Estimation of model parameters yielded t-values for each voxel, indicating the strength and direction of differences between the two contrasts (beta > alpha, alpha > beta). To address the issue of multiple comparisons, we applied Family-Wise Error (FWE) correction, adjusting significance thresholds such that only voxels with p < 0.05 would be included.”

      (2) I encourage the authors to find a better (visual) way of reporting Table 1, to make the main observations easier to grasp and compare (maybe a two-dimensional bar plot? Or color-coding the cells?)

      Reply: Thank you for your suggestion to improve the table, the new table is adjusted to the recommended changes to make it more readable.

      Reviewer #2 (Public Review):

      Summary:

      This is a very interesting paper that leveraged several publicly available datasets: invasive cortical recording in epilepsy patients, functional and structural connectomic data, and PET data related to dopaminergic and gaba-ergic synapses. These were combined to create a unified hypothesis of beta band oscillatory activity in the human brain. They show that beta frequency activity is ubiquitous, not just in sensorimotor areas, and cortical regions where beta predominated had high connectivity to regions high in dopamine re-uptake.

      Strengths:

      The authors leverage and integrate three publicly available human brain datasets in a creative way. While these public datasets are powerful tools for human neuroscience, it is innovative to combine these three types of data into a common brain space to generate novel findings and hypotheses. Findings are nicely controlled by separately examining cortical regions where alpha predominates (which have a different connectivity pattern). GABA uptake from PET studies is used as a control for the specificity of the relationship between beta activity and dopamine uptake. There is much interest in synchronized oscillatory activity as a mechanism of brain function and dysfunction, but the field is short on unifying hypotheses of why particular rhythms predominate in particular regions. This paper contributes nicely to that gap. It is ambitious in generating hypotheses, particularly that modulation of beta activity may be used as a "proxy" for modulating phasic dopamine release.

      Weaknesses:

      As the authors point out, the use of normative data is excellent for exploring hypotheses but does not address or explore individual variations which could lead to other insights. It is also biased to resting state activity; maps of task-related activity (if they were available) might show different findings.

      The figures, results, introduction, and methods are admirably clear and succinct but the discussion could be both shorter and more convincing.

      Reviewer #2 (Recommendations For The Authors):

      The tone of the discussion is excessively lofty and abstract, and hard to follow in places. Specific examples in comments to authors below.

      We thank the reviewer for their positive assessment and their constructive feedback on the discussion. Also in light of the other reviewers we have made a sincere effort to shorten, restructure and improve the discussion. Additionally, we have addressed all the specific comments the reviewer had below. We appended each change to the manuscript where appropriate below and have addressed all comments in the main text. Having that said, we see this paper and discussion to provide our most up-to-date and personal perspective on a correct concept on the interplay of beta oscillations and dopamine that is generalizable. Providing a concept that is so generalizable is very challenging and so far very few authors have even attempted this. One notable exception is the “status quo” concept by Fries & Engel. While we will do our very best to address the comments, we have decided not to deviate from our initial ambition to provide a discussion on a generalizable concept. Naturally such a concept must be very complex and therefore it will be hard to understand in parts. Through the revision, we hope that the readability and comprehensibility has improved, while it provides an in-depth perspective and hypothesis on how beta oscillations, dopamine and their brain circuits may facilitate brain function. Nevertheless, we want to express our honest gratitude for the thoroughness with which the reviewer has read and scrutinized our paper. The review clearly tells that the reviewer had the ambition to follow and understand what we were trying to convey, which can be rare nowadays. We are truly thankful for this.

      The first sentence is not quite true, as invasive neurophysiology was not, and cannot be, done in healthy humans. "The present study combined three openly available datasets of invasive neurophysiology, MRI connectomics, and molecular neuroimaging in healthy humans to characterise the spatial distribution of brain regions exhibiting resting beta activity, their shared circuit architecture, and its correlation with molecular markers of dopamine signaling in the human brain."

      Changes in manuscript: We have now removed the “healthy” from the respective sentence.

      "Our results motivate to conceptualise the capacity to generate.... This is not clear.

      Changes in manuscript: “Our results suggest that one common denominator of brain regions that generate beta activity, is their affiliation with beta oscillations as a feature that arises from a largescale global brain network that is modulated by dopamine.”

      "Similarly, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson's disease is long known" - the association between movement-related cortical beta desynchronization and Parkinson's motor signs is not well described - could the authors specify and reference this?

      We thank the reviewer for pointing out this lack of clarity. We meant that independently beta is known for “movement” and for “movement disorders” and not “movement in movement disorders”. Having that said, there are some studies that suggest that beta ERD is altered in PD (e.g.https://doi.org/10.1093/cercor/bht121), but saying that this is “long known” would be an overstatement and was not our intention. We rephrased this sentence accordingly.

      Changes in manuscript: The sentence now reads: “Moreover, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson’s disease is long known.”

      "...first fast-cyclic voltammetry experiments that allowed for combined measurement of dopamine release with invasive neurophysiology have provided first evidence that beta band oscillations in healthy non-human primates can differentially link dopamine release, beta oscillations and reward and motor control, depending on the contextual information and striatal domain" - This is not very clear - not sure what "differentially link" signifies.

      I think the fact that this is not easy to understand signifies the complexity that we and the authors of the cited paper from Ann Graybiel’s lab aimed to communicate. In fact, we stayed very close to the phrasing used in their paper to try and avoid confusion (Title: Dopamine and beta-band oscillations differentially link to striatal value and motor control” - https://doi.org/10.1126/sciadv.abb9226). The specific results go beyond the scope of the discussion but are very interesting, so I would be happy if our paper would inspire readers to look it up. 

      Changes in manuscript: We have now adapted the sentence to “In line with this more complex picture, direct measurement of dopamine concentration in non-human primates revealed specific interactions between dopamine release, beta oscillations, reward value and motor control, depending on contextual information and striatal domain. This shows that the relationship of dopamine and beta activity is not solely associated with either reward or movement and depends on where in the striatum beta activity is recorded.”

      "In fact, one could argue that it can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories" - this is not clear - for example what is a neural trajectory? What is meant by "re-entrance and refinement"?

      A neural trajectory refers to the path that the activity of a neural population takes through a high-dimensional space over time. It can be obtained through multivariate analysis of population activity with dimensionality reduction techniques, such as PCA. The concept of low-dimensional representations of high-dimensional neural activity has gained a lot of attention in computational neuroscience ever since high-channel count recordings of neural population activity have become available (an early and prominent example is Churchland et al., 2012 Nature https://doi.org/10.1038/nature11129 , while a more recent example is Safaie et al., Nature 2023 https://doi.org/10.1038/s41586-023-06714-0). The review we refer to by Rui Costa and colleagues (Athalye, V. R., Carmena, J. M. & Costa, R. M. Neural reinforcement: re-entering and refining neural dynamics leading to desirable outcomes. Curr Opin Neurobiol 60, 145–154 (2020) https://doi.org/10.1016/j.conb.2019.11.023) suggests that dopamine may serve to modulate the likelihood of a specific pattern to emerge and re-enter the cortex – basal ganglia loop, for the “reliable production of neural trajectories driving skillful behavior on-demand”. We believe that this concept could be revolutionary in our understanding of dopaminergic modulation and disoroders and together with colleague Alessia Cavallo have written an invited perspective on this topic (https://doi.org/10.1111/ejn.16222), which may help further clarify the topic. 

      Changes in manuscript: We realize that this aspect may sound a bit unclear or far away from the data in this manuscript. However, given that we have spent more than a decade thinking about beta oscillations and how they can be conceptualized, we would prefer not to entirely change our points and rather bet on the possibility that the concepts become more widely accepted and well-known. Nevertheless, we have now adapted the text to make this a bit more clear:

      “We hypothesise that, this “status quo” hypothesis could be equally or maybe even more adequately posed on the neural level. Namely, it could provide insights to what degree a certain activity pattern or synaptic connection is to be strengthened or weakened, in light of neural learning. We propose that this putative function can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories.”

      "....after which it was quickly translated to first experimental studies using cortical or subcortical beta signals in human patients44." - reference 44 only deals with the use of subcortical beta, not cortical, in adaptive control.

      The reviewer is right, in fact there is no study using motor cortex beta for adaptive DBS yet, but different studies have used different markers (especially gamma) since then. 

      Changes in manuscript: We have rephrased and added citations accordingly: “This approach, also termed adaptive DBS, was first demonstrated based on cortical beta activity that was used to adapt pallidal DBS in the MPTP non-human primate model of PD43. It was quickly translated to first experimental studies using subcortical beta signals in human patients44, followed by further research using more complex cortical and subcortical sensing setups and biomarker combinations45,46.”

      The paragraph headed " Implications for neurotechnology" is quite long and should be condensed and focused. It doesn't seem to support the last sentence, "....targeted interventions that can increase and decrease beta activity, as recently shown through phase specific modulation45 could be utilised to mimic phasic dopamine release as a neuroprosthetic approach to alter neural reinforcement38." - I don't quite follow the logic. The authors have clearly shown that beta-related circuits tend to be those linked to dopamine modulation, and may subserve tasks for which reinforcement learning is an important mechanism. However the logic of how modulation of beta activity can "substitute" for modulation of dopamine isn't clear. That would seem to require that the mechanism by which dopamine produces reinforcement, is via an effect on beta oscillation properties (phase, amplitude, frequency). Is there evidence for this? If so it should be better spelled out.

      We realize that this is very speculative at this point. Indeed, we believe that subthalamic DBS can mimic dopaminergic control and in the future there may be new treatment avenues, e.g. using neurochemical using neurochemical interfaces for which beta could be informative to mimic dopamine release but ultimately explaining this would be very complex, so we have removed the sentence. With regard to the remaining text in the section, we considered shortening / condensing but felt that this paragraph is highly relevant for the ongoing development of neurotechnology and therefore decided to only remove the first and last sentences.

      Changes in manuscript: We have removed the first and last sentences.

      "While the abovementioned prospects are promising we should cautiously consider the limitations of our study." - an unnecessary sentence to start a "limitations" section, its clearly a paragraph about limitations. In general, authors should go thru discussion and reduce verbosity; it is not nearly as well edited as the rest of the paper.

      Agreed. 

      Changes in manuscript: We removed the sentence. 

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Chikermane et al. leverages a large open dataset of intracranial recordings (sEEG or ECoG) to analyze resting state (eyes closed) oscillatory activity from a variety of human brain areas. The authors identify a dominant proportion of channels in which beta band activity (12-30Hz) is most prominent and subsequently seek to relate this to anatomical connectivity data by using the sEEG/ECoG electrodes as seeds in a large set of MRI data from the human connectome project. This reveals separate regions and white matter tracts for alpha (primarily occipital) and beta (prefrontal cortex and basal ganglia) oscillations. Finally, using a third available dataset of PET imaging, the authors relate the parcellated signals to dopamine signaling as estimated by spatial uptake patterns of dopamine, and reveal a significant correlation between the functional connectivity maps and the dopamine reuptake maps, suggesting a functional relationship between the two.

      Strengths:

      Overall, I found the paper well justified, focused on an important topic, and interesting. The authors' use of 3 different open datasets was creative and informative, and it significantly adds to our understanding of different oscillatory networks in the human brain, and their more elusive relation with neuromodulator signaling networks by adding to our knowledge of the association between beta oscillations and dopamine signaling. Even my main comments about the lack of a theta network analysis and discussion points are relatively minor, and I believe this paper is valuable and informative.

      Weaknesses:

      The analyses were adequate, and the authors cleverly leveraged these different datasets to build an interesting story. The main aspect I found missing (in addition to some discussion items, see below) was an examination of the theta network. Theta oscillations have been involved in a number of cognitive processes including spatial navigation and memory, and have been proposed to have different potential originating brain regions, and it would be informative to see how their anatomical networks (e.g. as in Figure 2) look like under the author's analyses.

      The authors devote a significant portion of the discussion to relating their findings to a popular hypothesis for the function of beta oscillations, the maintenance of the "status quo", mostly in the context of motor control. As the authors acknowledge, given the static nature of the data and lack of behavior, this interpretation remains largely speculative and I found it a bit too far-reaching given the data shown in the paper. In contrast, I missed a more detailed discussion on the growing literature indicating a role for beta in mood (e.g. in Kirkby et al. 2018), especially given the apparent lack of hippocampal and amygdala involvement in the paper, which was surprising.

      We thank the reviewer for their insightful review of our manuscript. One of the aims of our paper was to provide the ground for a circuit-based conceptualization of beta activity, which does not primarily relate to behavior. Practically we have the ambition to provide a generalizable concept that can be applied to all behavioral domains including mood. The reason we focus on the “status quo” hypothesis, is that it is one of the very few if not only generalizable concept of the function of beta oscillations. Through our paper and the discussion, we have to redirect this concept towards a less cognitive/behavioral and more anatomical network based domain, while acknowledging principles that may overlap. We realize that this is very ambitious and this endeavour is necessarily very complex and not easy to communicate. In light of the reviewers comments, we have made an effort to improve the discussion as best we could without trailing too far away from what our initial aim was. We are thankful for the suggested reference, which we have now added to the discussion in the section where we have previously discussed beta as biomarker for mood, also noting the absence of beta dominant channels in amygdala and hippocampus. Here it should be clarified however, that a) only three channels were located in the amygdala of which one exhibited beta activity, we should be cautious to not overinterpret this result and b) most channels exhibited beta and just because beta wasn’t dominant, it doesn’t mean that beta is not present or important in these brain areas. Absence of evidence is not evidence for absence with the way we approached the analysis. We are thankful for the interesting reference, which we have now included our discussion. Notably the study used a complex network analysis, which we could not perform because we did not have parallel recordings from these areas in multiple patients. This is now noted in the limitations. 

      Changes in manuscript: “For example, it was shown that beta is implicated in working memory28, utilisation of salient sensory cues29, language processing30, motivation31, sleep32, emotion recognition33, mood34 and may even serve as a biomarker for depressive symptom severity in the anterior cingulate cortex35” and “One impactful study reported that beta oscillatory sub-networks of Amygdala and hippocampus could reflect human variations in mood 34. This is interesting, but highlights another relevant limitation of our study, namely that recordings in different areas were stemming from different patients and thus, such sub-network analyses on the oscillatory level could not be conducted.” 

      Major comment:

      • Although the proportion of electrodes with theta-dominant oscillations was lower (~15%) than alpha (~22%) or beta (~57%), it would be very valuable to also see the same analyses the authors carried out in these frequency bands extended to theta oscillations.

      We agree with the reviewer and appreciate the interest in other frequency bands; theta, alpha and gamma. Our primary interest was to provide a network concept of beta activity, but anticipated that interest would go beyond that frequency band. However, we also had to limit ourselves to what is communicable and comprehensible. The key aim for us was to provide a data-driven circuit description of beta activity that can lay ground for a generalizable concept of where beta oscillations emerge. Reproducing all analyses for every frequency band would clutter both the results and the discussion. Moreover, the honest truth is that funding and individual career plans of the researchers currently do not allow to allocate time for a reanalysis of all data which would be a significant effort. Therefore, we have decided to just add the topography of theta and gamma channels as a supplement. In case the reviewer is interested on a collaboration on extending this project to other frequency bands and circuits, we would like to invite them to get in touch and perhaps this could be a new collaborative project. Until then, we have extended our limitation that this would be important work for the future. 

      Changes in manuscript: 

      We have added and cited the new supplementary figure for the results from theta in the results section, which now reads: 

      “Further information on the topography of theta channels are shown in supplementary figure 1.”

      We would like to add that a sensible interpretation of results from gamma dominant channels is unlikely to be possible given the low count of channels with prominent resting activity in this frequency band. We have added the following text to the limitations section: “The aim of this study was to elucidate the circuit architecture of beta oscillations, which is why insights from this study for other frequency bands are limited. Future research investigating the specific circuits of theta, alpha and gamma oscillations and their relationship with neurotransmitter uptake could yield new important insights on the networks underlying human brain rhythms.“ 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      • Results: "we performed non-parametric Spearman's correlations between the structural and functional connectivity maps of beta networks with neurotransmitter uptake". This is a significantly complex analysis that requires more detail for the reader to evaluate. There is more detail in the Figure 3 legend but still insufficient. The Methods offer more detail, but I found the description of the parcellation to be vague and I would appreciate a more detailed description.

      We thank the reviewer for bringing the insufficient explanation of the methods used to calculate the correlations in analysis to our attention. We have now made an effort to provide more level of detail in the relevant paragraphs. 

      Changes in manuscript: We have now made changes to both the Results and Methods sections and added the following explanations respectively:

      Results: “Next, we resliced the beta network map and the PET images to allow for a meaningful comparison, using a combined parcellation with 476 brain regions that include cortex19, basal ganglia20, and cerebellum21. Here, each parcel – which was a collection of voxels belonging to a particular brain region – from the connectivity map was correlated with the same parcel containing average neurotransmitter uptake from the respective PET scan (see Figure 3A). In this way nonparametric Spearman’s correlations between PET intensity and structural and functional connectivity maps of beta networks were obtained, which indicate to what degree the spatial distribution of connectivity is similar to the distribution of neurotransmitter uptake.“

      Methods: “A custom master parcellation in MNI space was created in Matlab using SPM functions by combining three existing parcellations to include cortical regions19, structures of the basal ganglia20 and cerebellar regions21. Regions that were (partially) overlapping between the atlases were only selected once. The final compound parcellation had 476 regions in total. This parcellation was applied to both PET and structural and functional connectivity maps using SPM and custom code. This allowed for the calculation of spatial correlations, providing a statistical measure of spatial similarity of the PET intensity and MRI connectivity distributions. For this, Spearman’s ranked correlations were used to calculate correlations between the PET images, such as the dopamine aggregate map and both functional and structural beta connectivity networks (Figure 3). The analysis was repeated for individual tracers showing similar results Supplementary figure 2. Finally, to validate these results, a control analysis was performed using a GABA PET scan from the same open dataset of neurotransmitter uptake following the same pipeline (Figure 2A, 2B).”

      • All of the recordings were taken in an eyes-closed condition. This is likely to affect the power of alpha oscillations; the authors should comment on this.

      We agree with the reviewer that this will likely have influenced the results. However, given that the key result of our paper is the abundance and circuit topography of beta oscillations, it is unlikely that increased alpha in some channels will have led to false positive results for beta. If anything, it may have increased the contrast leading to a more conservative estimate of which channels truly show strong beta dominance. On the other hand, we should acknowledge that this limitation can affect the interpretation of the alpha result. Another reason for us to primarily focus on beta in the discussion and results presentation. 

      Changes in manuscript: We now comment on this in the results:

      “It should be noted that that alpha recordings were performed in eyes closed which is known to increase alpha power, which may influence the generalizability of the alpha maps to an eyes open condition. However, given that our primary use of alpha was to act as a control, we believe that this should not affect the interpretability of the key findings of our study.” 

      • Although the relative proportion of theta and gamma channels is lower, it would be interesting to see the distribution of channels in a SOM figure.

      As described above, we have now added supplementary figure 1 that accommodates the topography but not the network analyses.

      • Figure legend - typo - "Neither, alpha nor beta" - no comma needed.

      Now fixed, thank you for pointing is to this lapse!

      • Results: " ere, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with current neurophysiology approaches" not entirely accurate; suggest rephrasing it to "Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches "

      Thank you for suggesting the alternative formulation. 

      Changes in manuscript: The text has been modified as per the suggestion and now reads “Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches”.

      • Results - typo - "cortical brain areas, that exhibit resting beta activity share a common brain network" - no comma needed.

      Thank you for the suggestion, the comma has been removed to better the flow of the sentence structure as suggested.

    1. Author response:

      eLife Assessment

      This useful study presents the first detailed and comprehensive description of brain sulcus anatomy of a range of carnivoran species based on a robust manual labeling model allowing species comparisons. Although the database is recognized and the method for reconstructing cortical surfaces is convincing, the evidence supporting the conclusions is incomplete due to the lack of appropriate quantitative measurements and analyses. Considering additional specimens to assess intraspecies variations, as well as exploring the functional correlates of interspecies differences would increase the scope of the study. Setting an instructive foundation for comparative anatomy, this study will be of interest to neuroscientists and neuroimaging researchers interested in that field, as well as in brain morphology and sulcal patterns, their phylogeny, and ontogeny in relation to functional development and behaviour. 

      We are pleased that our primary objective of creating a comprehensive framework to navigate carnivoran brains is considered as successfully achieved and that our work is expected to be of broad interest to various disciplines, as it provides the foundation for future investigations into carnivoran brain organization.

      As we will set out below, a description of the major sulci is an appropriate measure for large-scale comparative anatomy — it is stable enough in the population of each species to not require a large N, provides a suitable variability across species, and can be related to other aspects of between-species diversity. We will include a number of additional species to increase the scope of the study, as suggested. Although a quantitative assessment of functional correlates is, in principle, beyond the scope of this first foundational paper, we will provide a first start of this as well. We emphasize, however, that this was a secondary outcome, emerging after first application of the framework.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Boch and colleagues, entitled Comparative Neuroimaging of the Carnivore Brain: Neocortical Sulcal Anatomy, compares and describes the cortical sulci of eighteen carnivore species, and sets a benchmark for future work on comparative brains. 

      Based on previous observations, electrophysiological, histological and neuroimaging studies and their own observations, the authors establish a correspondence between the cortical sulci and gyri of these species. The different folding patterns of all brain regions are detailed, put into perspective in relation to their phylogeny as well as their potential involvement in cortical area expansion and behavioral differences. 

      Strengths: 

      This is a pioneering article, very useful for comparative brain studies and conducted with great seriousness and based on many past studies. The article is well-written and very didactic. The different protocols for brain collection, perfusion, and scanning are very detailed. The images are self-explanatory and of high quality. The authors explain their choice of nomenclature and labels for sulci and gyri on all species, with many arguments. The opening on ecology and social behavior in the discussion is of great interest and helps to put into perspective the differences in folding found at the level of the different cortexes. In addition, the authors do not forget to put their results into the context of the laws of allometry. They explain, for example, that although the largest brains were the most folded and had the deepest folds in their dataset, they did not necessarily have unique sulci, unlike some of the smaller, smoother brains. 

      Weaknesses: 

      The article is aware of its limitations, not being able to take into account inter-individual variability within each species, inter-hemispheric asymmetries, or differences between males and females. However, this does not detract from their aim, which is to lay the foundations for a correspondence between the brains of carnivores so that navigation within the brains of these species can be simplified for future studies. This article does not include comparisons of morphometric data such as sulci depth, sulci wall surface, or thickness of the cortical ribbon around the sulci. 

      We thank the reviewer for their overwhelmingly positive evaluation of our work. As noted by the reviewer, our primary aim was to establish a framework for navigating carnivoran brains to lay the foundation for future research. We are pleased that this objective is deemed as successfully achieved.

      As the reviewer points out, we do not quantify within-species intraindividual differences. This is a conscious choice; we aimed to emphasize breadth of species over individuals, as is standard in large-scale comparative anatomy (cf. Heuer et al., 2023, eLife; Suarez et al., 2022, eLife). Following the logic of phylogenetic relationships, the presence of a particular sulcus in related species is also a measure of reliability. We felt safe in this choice, as previous work in both primates and carnivorans has shown that differences across major sulci across individuals are a matter of degree rather than a case of presence or absence (Connolly, 1950, External morphology of the primate brain, C.C. Thomas; Hecht et al., 2019 J Neurosci; Kawamuro 1971 Acta Anat., Kawamuro & Naito, 1977, Acta Anat.). In our revised manuscript, we aim to include some additional individuals of selected species as supplementary material, further illustrating this point.

      We feel that measures such as sulci depth, sulci wall surface, or thickness of the cortical ribbon are measures that vary more across individuals and we have therefore not included them in the study. In addition, these are measures that are not generally used as between-species comparative measures, whereas sulcal patterning is (cf. Amiez et al., 2019, Nat Comms; Connolly, 1950; Miller et al., 2021, Brain Behav Evol; Radinsky 1975, J Mammal; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J. Comp Neurol).

      Reviewer #2 (Public review): 

      Summary: 

      The authors have completed MRI-based descriptions of the sulcal anatomy of 18 carnivoran species that vary greatly in behaviour and ecology. In this descriptive study, different sulcal patterns are identified in relation to phylogeny and, to some extent, behaviour. The authors argue that the reported differences across families reflect behaviour and electrophysiology, but these correlations are not supported by any analyses. 

      Strengths: 

      A major strength of this paper is using very similar imaging methods across all specimens. Often papers like this rely on highly variable methods so that consistency reduces some of the variability that can arise due to methodology. 

      The descriptive anatomy was accurate and precise. I could readily follow exactly where on the cortical surface the authors referring. This is not always the case for descriptive anatomy papers, so I appreciated the efforts the authors took to make the results understandable for a broader audience. 

      I also greatly appreciate the authors making the images open access through their website. 

      Weaknesses: 

      Although I enjoyed many aspects of this manuscript, it is lacking in any quantitative analyses that would provide more insights into what these variations in sulcal anatomy might mean. The authors do discuss inter-clade differences in relation to behaviour and older electrophysiology papers by Welker, Campos, Johnson, and others, but it would be more biologically relevant to try to calculate surface areas or volumes of cortical fields defined by some of these sulci. For example, something like the endocast surface area measurements used by Sakai and colleagues would allow the authors to test for differences among clades, in relation to brain/body size, or behaviour. Quantitative measurements would also aid significantly in supporting some of the potential correlations hinted at in the Discussion. 

      Although quantitative measurements would be helpful, there are also some significant concerns in relation to the specimens themselves. First, almost all of these are captive individuals. We know that environmental differences can alter neocortical development and humans and nonhuman animals and domestication affects neocortical volume and morphology. Whether captive breeding affects neocortical anatomy might not be known, but it can affect other brain regions and overall brain size and could affect sulcal patterns. Second, despite using similar imaging methods across specimens, fixation varied markedly across specimens. Fixation is unlikely to affect the ability to recognize deep sulci, but variations in shrinkage could nevertheless affect overall brain size and morphology, including the ability to recognize shallow sulci. Third, the sample size = 1 for every species examined. In humans and nonhuman animals, sulcal patterns can vary significantly among individuals. In domestic dogs, it can even vary greatly across breeds. It, therefore, remains unclear to what extent the pattern observed in one individual can be generalized for a species, let alone an entire genus or family. The lack of accounting for inter-individual variability makes it difficult to make any firm conclusions regarding the functional relevance of sulcal patterns. 

      We thank the reviewer for their assessment of our work. The primary aim of this study was to establish a framework for navigating carnivoran brains by providing a comprehensive overview of all major neocortical sulci across eighteen different species. Given the inconsistent nomenclature in the literature and the lack of standardized criteria (“recipes”) for identifying the major sulci, we specifically focused on homogenizing the terminology and creating recipes for their identification. Moreover, we also generated digital surfaces of all brains and will also add sulcal masks to further facilitate future research building on our framework. We are pleased to hear that we succeeded in our primary objective.

      We respectfully disagree with the reviewer on two accounts, where we believe the reviewer is not judging the scope of the current work.

      The first is with respect to individual differences. To the best of our knowledge, differences between captive and wild animals, or indeed between individuals, do not affect the presence or absence of any major sulci. No differences in sulcal patterns were detected between captive and (semi-)wild macaques (cf. Sallet et al., 2011, Science; Testard et al., 2022, Sci Adv), different dog breeds (Hecht et al., 2019 J Neurosci) or foxes selectively bred to simulate domestication, compared to controls (Hecht et al., 2021 J. Neurosci). Indeed, we do not find major differences between wolf-like canid species, suggesting that a difference between individuals of the same species is even more unlikely. Nevertheless, we agree with the reviewer that building up a database like ours will benefit from providing as much information about the samples as possible to enable these issues to be tested. We, therefore, will update our table to include if the animals were from captive or wild populations. Moreover, we aim, where possible, to include both wild and captive animals of the same species if they are available in our revision.

      The second is in the quantification of structure/function relationships. We believe the sulci atlases themselves are the main deliverables of this project. We felt it prudent to include some qualitative descriptions of the relationship between sulci as we observed them and behaviours as known from the literature as an illustration of the possibilities that this foundational work opens us. This approach also allowed us to confirm previous findings based on observations from a less diverse range of carnivoran species and families (Radinsky 1968 J Comp Neurol; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J Comp Neurol; Welker & Seidenstein, 1959 J Comp Neurol). However, a full statistical framework for analysis is beyond the scope of this paper. Our group has previously worked on methods to quantitatively compare brain organization across species — indeed, we have developed a full framework for doing so (Mars et al., 2021, Annu Rev Neurosci), based on the idea that brains that differ in size and morphology should be compared based on anatomical features in a common feature space. Previously, we have used white matter anatomy (Mars et al., 2018, eLife) and spatial transcriptomics (Beauchamp et al., 2021, eLife). The present work presents the foundation for this approach to be expanded to sulcal anatomy, but the full development of this approach will be the topic of future communications.

      Nevertheless, we aim to include a first step quantitative analysis of the relationship between the presence and absence of particular sulci and the two behaviours of interest in our manuscript.

      We also would like to emphasize that we strongly believe that looking at measures of brain organization at a more detailed level than brain size or relative brain size is informative. Indeed, studies looking at correlations between brain size and particular behavioural variables, although very prominent in the literature, have found it very difficult to distinguish between competing behavioural hypotheses (Healy, 2021, Adaptation and the brain, OUP). In contrast, connectivity has a much more direct relationship to behavioural differences across species (Bryant et al., 2024, bioRxiv), as does sulcal anatomy (Amiez et al., 2019, Nat Comms; Miller et al., 2021, Brain Behav Evol). Moreover, such measures are less sensitive to the effects of fixation since that will affect brain size but not the presence or absence of a sulcus.

      Following the reviewer’s recommendations, we will endeavour to include an even broader range of species in the revised version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce diCerentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is suCicient to trigger terminal diCerentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and diCerentiation. The data appear to be of high quality and the evidences are strengthened through a combination of diCerent genetic mouse models, RNA sequencing, and immunofluorescence analysis. 

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer diCerentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics, and cell position. Whether any of these changes is instructive for diCerentiation itself and whether consecutive changes in diCerentiation are required remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between diCerent states of keratinocyte diCerentiation. In this study, through genetic fluorescence labeling of cell states at diCerent developmental time points the authors were able to analyze gene expression of basal stem cells and suprabasal diCerentiated cells at two diCerent stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal so-called intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier-forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of diCerentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model. 

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocytes can undergo either stepwise (E16) or more direct (E14) terminal diCerentiation. 

      Previous studies by several groups found an increased actomyosin contractility in the barrier-forming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for diCerentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10Arhgef11CA). Both models induce late diCerentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late diCerentiation of granular cells. A potential weakness of the K10-spastin model is the disruption of MT as the primary eCect which secondarily causes hypercontractility. However, their previous publications provided some evidence that the eCect on diCerentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, the data are confirmed by the second model directly activating myosin through RhoA. These previous publications already indicated a role for contractility in diCerentiation but were focused on early diCerentiation. The data in this manuscript focus on the regulation of late diCerentiation in barrier-forming cells. These important data help to unravel the interdependencies of cell position, mechanical state, and diCerentiation in the epidermis, suggesting that an increase in cellular contractility in most apical positions within the epidermis can induce terminal diCerentiation. Importantly the authors show that despite contractility-induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier-forming granular layer, YAP nuclear localization is not suCicient to drive premature diCerentiation when forced to the nucleus in the spinous layer. 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used published datasets of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal diCerentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated. 

      We thank all the reviewers for their suggestions and comments.

      Thank you especially for the reminder to include gene lists. We had an excel document with all this data but neglected to upload it with the initial manuscript decision. This includes all the gene signatures for the diCerent cell compartments across development. We will also include a page that lists all EDC genes and whether they were up-regulated in intermediate cells and cells in which contractility was induced. Further, we note that all the RNA-Seq datasets are available for use on GEO. 

      In our previous publication, we indeed included images showing a lack of change in loricrin and filaggrin in the embryos where spastin was expressed in the diCerentiated epidermis. Consistent with this, there is no change in Lor mRNA levels by RNA-Seq, (it is one of the rare EDC genes that is unchanged). In contrast, Flg mRNA was up in the RNASeq, though we didn’t see a dramatic change in protein levels. We have not further pursued whether this reflects translational regulation. That said, our data clearly show that other genes associated with granular fate were increased in the contractile skin.  

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express higher levels of contractilityrelated genes than spinous layers and overexpression of cytoskeletal regulators accelerates the diCerentiation of spinous layer cells into granular cells. 

      Overall the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their diCerentiation trajectories and points to a potential role of contractility in promoting diCerentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to diCerentiate into granular cells, and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears deferentially active in the two states, it is shown to be downstream rather than upstream of the observed diCerences in mechanics. 

      Strengths: 

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing, and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological eCects appear robust. The manuscript is clearly written and logical to follow. 

      Weaknesses: 

      While the data overall supports the authors' claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this eCect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the diCerentiation of these cells. 

      We agree with the reviewer that the development of additional tools to precisely control myosin activity will be of great use to the field. That said, our series of publications has clearly demonstrated that ablating microtubules results in increased contractility and that this phenocopies the eCects of Arhgef11 induced contractility (Ning et al, Cell Stem Cell 2021). Further, we showed that these phenotypes were rescued by myosin inhibition with blebbistatin. Our prior publications also showed a clear increase in junctional acto-myosin through expression of either spastin or Arhgef11, as well as increased staining for the tension sensitive epitope of alpha-catenin (alpha-18) (also in Ning et al, 2021).  We are not aware of tools that allow direct manipulation of myosin activity that currently exist in mouse models.  

      The gene expression analyses are relatively superficial and rely heavily on GO term analyses which are of course informative but do not give the reader a good sense of what kind of genes and transcriptional programs are regulated. It would be useful to show volcano plots or heatmaps of actual gene expression changes as well as to perform additional analyses of for example gene set enrichment and/or transcription factor enrichment analyses to better describe the transcriptional programs 

      We will include an excel document that lists all the gene signatures. Additionally, all of our data are deposited in GEO for others to perform their own analyses.  

      Claims of changes in cell division/proliferation changes are made exclusively by quantifying EdU incorporation. It would be useful to more directly look at mitosis. At minimum Y-axis labels should be changed from "% Dividing cells" to % EdU+ cells to more accurately represent findings 

      We will change the axis label to precisely match our analysis.  

      Despite these minor weaknesses the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis, and will likely be of interest to the skin research community. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike laterdeveloping suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to diCerentiate into spinous cells, but lineage tracing convincingly shows ICs diCerentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers and shows the spinous signature gene, MafB, is suCicient to repress proliferation when prematurely expressed in ICs. 

      Strengths: 

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much-understudied cell type, at high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that diCerentiation can be induced by cell contractility is an intriguing finding and adds a growing list of examples where cell mechanics influence gene expression and diCerentiation. 

      Weaknesses: 

      A weakness of the study is an over-reliance on overexpression and suCiciency experiments to test the contributions of MafB, Yap, and contractility in diCerentiation. The inclusion of loss-of-function approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. The inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of diCerentiation. 

      We agree that loss of function studies would be useful. For MafB, these have been performed in cultured human keratinocytes, where loss of MafB and its ortholog cMaf results in a phenotype consistent with loss of spinous diCerentiation (Lopes-Pajares, Dev Cell 2015). Due to the complex genetics involved, generating these double mutant mice is beyond the scope of this study. Loss of function studies of myosin are also complicated by genetic redundancy of the non-muscle type II myosin genes, as well as the role for these myosins in actin cross linking in addition to contractility. In addition, we have found that these myosins are quite stable in the embryonic intestine, with loss of protein delayed by several days from the induction of recombination. Therefore, elimination of myosins by embryonic day e14.5 with our current drivers is not likely possible. Thus, generation of inducible inhibitors of contractility is a valuable future goal. 

      A number of recent papers have used AFM of skin sections to probe tissue rigidity. We have not attempted these studies and are unclear about the spatial resolution and whether, in the very thin epidermis at these stages we could spatially resolve diCerences. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).  

      Finally, whether the expression of granular-associated genes in ICs provides them with some sort of barrier function in the embryo is not addressed, so the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact.  

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. We have attempted experiments to ablate intermediate cells with DTA expression - this resulted in ineCicient and delayed cell death and thus did not yield strong conclusions. Our findings that transcriptional regulators of granular diCerentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the eCects of their ablation on the earliest stages of granular diCerentiation from intermediate cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper aims to address the establishment and maintenance of neural circuitry in the case of a massive loss of neurons. The authors used genetic manipulations to ablate the principal projection neurons, the mitral/tufted cells, in the mouse olfactory bulb. Using diphtheria toxin (Tbx21-Cre:: loxP-DTA line) the authors ablated progressively large numbers of M/T cells postnatally. By injecting diphtheria toxin (DT) into the Tbx21-Cre:: loxP-iDTR line, the authors were able to control the timing of the ablation in the adult stage. Both methods led to the successful elimination of a majority of M/TCs by 4 months of age. The authors made a few interesting observations. First, they found that the initial pruning of the remaining M/T cell primary dendrite was unaffected. However, in adulthood, a significant portion of these cells extended primary dendrites to innervate multiple glomeruli. Moreover, the incoming olfactory sensory neuron (OSN) axons, as examined for those expressing the M72 receptor, showed a divergent innervation pattern as well. The authors conclude that M/T cell density is required to maintain the dendritic structures and the olfactory map. To address the functional consequences of eliminating a large portion of principal neurons, the authors conducted a series of behavioral assays. They found that learned odor discrimination was largely intact. On the other hand, mating and aggression were reduced. The authors concluded that learned behaviors are more resilient than innate ones.

      The study is technically sound, and the results are clear-cut. The most striking result is the contrast between the normal dendritic pruning during early development and the expanded dendritic innervation in adulthood. It is a novel discovery that can lead to further investigation of how the single-glomerulus dendritic innervation is maintained. The authors conducted a

      few experiments to address potential mechanisms, but it is inconclusive, as detailed below. It is also interesting to see that the massive neuronal loss did not severely impact learned odor discrimination. This result, together with previous studies showing nearly normal odor discrimination in the absence of large portions of the olfactory bulb or scrambled innervation patterns, attests to the redundancy and robustness of the sensory system. The discussion should take into account these other studies in a historical context.

      Main comments:

      (1) In previous studies, it has been concluded that dendritic pruning unfolds independently, regardless of the innervation pattern or activity of the OSNs. The new observation bolsters this conclusion by showing that a loss of neighboring M/T cells does not affect the developmental process. A more nuanced discussion comparing the results of these studies would strengthen the paper.

      We thank the reviewer for the suggestion. We now include an extended discussion citing relevant previous works in the manuscript (Lines 351-374).

      (2) The authors propose that a certain density of M/T is required to prevent the divergent innervation of primary dendrites, but the evidence is not sufficient to support this proposal. The experiment with low-dose DT injection to ablate a smaller portion of M/T cells did not change the percentage of cells innervating two or more glomeruli. The authors suggest that a threshold must be met, but this threshold is not determined.  

      In our experiments using high-dose DT, we hypothesized that there may be many empty glomeruli (glomeruli not innervated by M/T cells), and as a result, that some of the remaining M/T cells could branch their apical dendrite tuft into multiple empty glomeruli. To test this hypothesis, we carried out another experiment using a lower dose of DT. In this experiment, the fraction of remaining M/T cells was 25% (~10,000 M/T cells), which was higher than with the high DT dose (5%, or around 2,000 M/T cells) , but still significantly lower than wild type mice (~40,000 cells M/T cells). With around 2,000 glomeruli and 10,000 M/T per bulb, it could be expected that each glomerulus would be innervated by ~5 M/T cells (on average). However, we found that the percentage of M/T cells projecting to multiple glomeruli (around 40%) was similar when either 10,000 or 2,000 of M/T remained in the bulb. In addition, it is important to emphasize that even in wt animals with a full set of M/T cells, a small percentage of M/T cells still innervate more than one glomerulus (Lin et al., 2000). Together, these observations suggest that the innervation of multiple glomeruli by M/T cells is not simply due to the presence of empty glomeruli, and that our hypothesis was not correct.

      We have added a comment explaining this issue in the Results section (Lines 200-203).

      (3) The authors suggest that neural activity is not required for this plasticity. The evidence was derived primarily from naris occlusion and neuronal silencing using Kir2.1. While the results are consistent with the notion, it is a rather narrow interpretation of how neural activity affects circuit configuration. Perturbation of neural activity also entails an increase in firing. Inducing the activity of the neurons may alter this plasticity. Silencing per se may induce a homeostatic response that expands the neurite innervation pattern to increase synaptic input to compensate for the loss of activity. Thus, further silencing the cells may not reduce multiglomerular innervation, but an increased activity may.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (4) There is a discrepancy between this study and the one by Fujimoto et al. (Developmental Cell; 2023), which shows that not only glutamatergic inputs to the primary dendrite can facilitate pruning of remaining dendrites but also Kir2.1 overexpression can significantly perturb dendritic pruning. This discrepancy is not discussed by the authors.

      We agree that it would be useful to contrast these two works.

      In our experiments, performed in adult animals, we blocked sensory input by performing naris occlusion before we induced ablation of M/T cells. In a separate experiment, also in adult animals, we expressed the Kir2.1 channel, to reduce the ability of neurons to fire action potentials. With both types of manipulations, we observed that the ablation of a large fraction of M/T cells still caused the remaining M/T cells to maintain a single apical dendrite that sprouts several new tufts towards multiple glomeruli. A recent paper (Fujimoto et al., 2023)) in which Kir2.1 was expressed in a large percentage of M/T starting during embryonic development showed that these “silent” M/T cells failed to prune their arbors to a single dendrite. In aggregate, these observations indicate that action potentials are necessary for the normal pruning that occurs during perinatal development (Fujimoto et al., 2023), but are not required for the expansion of dendritic trees caused by ablating a large fraction of M/T cells in adult animals (our current manuscript).

      We have now explained the differences between both studies in the manuscript (Lines 427-439).

      (5) An alternative interpretation of the discrepancy between the apparent normal pruning by p10 and expanded dendritic innervation in adulthood is that there are more cells before P10, when ~25% of M/T cells are present, but at a later date only 1-3% are present. 

      The relationship between the number of M/T cells and single glomerulus innervation has not been explored during postnatal development. It would be important to test this hypothesis.

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (6) The authors attribute the change in the olfactory map to the loss of M/T cells. Another obvious possibility is that the diffused projection is a response to the change in the olfactory bulb size. With less space to occupy, the axons may be forced to innervate neighboring glomeruli. It is not known how the total number of glomeruli is affected. This question could be addressed by tracking developmental changes in bulb volume and glomerular numbers.

      Certainly, this is a possibility, and we have now included a comment on this regard in the manuscript (Lines 473-480). 

      We believe that there are three likely scenarios that could account for these observations:

      (a) After ablating M/T cells, the tufts of the remaining M/T cells sprout into multiple glomeruli, and this causes the axons of OSNs to project into multiple glomeruli.

      (b) Ablating M/T cells may cause changes in other OB cells that make synapses in the glomeruli (ETCs, PGCs, sAC, etc…), and the misrouting of OSN axons that we observed in our experiments may be a secondary effect caused by the elimination of M/T cells.

      (c) After ablating the majority of M/T cells, the olfactory bulb gets reduced in size, and the axons of OSNs find it difficult to precisely converge on a target that now has become smaller. As a result, the axons of OSNs fail to converge on single glomeruli.

      (7) The retained ability to discriminate odors upon reinforced training is not surprising in light of a number of earlier studies. For example, Slotnick and colleagues have shown that rats losing ~90% of the OB can retain odor discrimination. Weiss et al have shown that humans without an olfactory bulb can perform normal olfactory tasks. Gronowitz et al have used theoretical prediction and experimental results to demonstrate that perturbing the olfactory map does not have a major impact on olfactory discrimination. Fleischmann et al have shown that mice with a monoclonal nose can discriminate odors. The authors should discuss their results in these contexts.

      We apologize for this important oversight - we now include a more elaborate discussion including the relevant references as suggested in the manuscript (Line 483-496).

      (8) It should be noted that odor discrimination resulting from reinforcement training does not mean normal olfactory function. It is a highly artificial situation as the animals are overtrained. It should not be used as a measure of the robustness of the olfactory sense. Natural odor discrimination (without training), detection threshold, and innate appetitive/aversive response to certain odors may be affected. These experiments were not conducted.

      We agree that the standard tests commonly used to measure olfactory function require substantial training, and thus, are quite artificial. However, these tests are used because they allow a more precise quantification of olfactory function than those relying on natural behaviors.  

      We have now included a few sentences to address this point in the results (Lines 321322) and discussion sections (Lines 541-543).

      (9) The social behaviors were conducted using relatively coarse measures (vaginal plug and display of aggression). Moreover, these behaviors are most likely affected by the disruption of the AOB mitral cells and have little to do with the dendritic pruning process described in the paper. It is misleading to lump social behaviors with innate responses to odors.

      This point follows the same logic as the previous one. The olfactory tests that rely on natural behaviors are quite coarse and difficult to quantify. In contrast, the olfactory tests using apparatuses such as olfactometers can be quantified with precision, but they are artificial. We agree that some of the naturalistic behaviors that we studied such as mating or aggression may depend to a large extent on the AOB (although it is possible that the MOB may also be involved in these tasks to a degree). In our initial version of the manuscript, we commented on the anticipated relative involvement of the MOB and AOB in the studied tasks, but we have now added some additional sentences to make this point clearer. In addition, we now add a comment indicating that it is possible that the abnormal behaviors could simply be due to a reduction in the number of AOB M/T cells (~98.5% and ~ 85% elimination of M/T cells in the AOB in Tbx::DTA and Tbx::iDTR mice, respectively), regardless of the abnormal dendritic pruning of main OB M/T cells (Lines 530-534).

      See Figure 5E - M/T cells in AOB (Lines 1238-1239). 

      Reviewer #2 (Public Review):

      The authors make the interesting observation that the developmental refinement of apical M/T cell dendrites into individual glomeruli proceeds normally even when the majority of neighboring M/T cells are ablated. At later stages, the remaining neurons develop additional dendrites that invade multiple glomeruli ectopically, and similarly, OSN inputs to glomeruli lose projection specificity as well. The authors conclude that the normal density of M/T neurons is not required for developmental refinement, but rather for maintaining specific connectivity in adults.

      The observations are indeed quite striking; however, the authors' conclusions are not entirely supported by the data.

      (1) It is unclear whether the expression of diphtheria toxin that eventually leads to the ablation of the large majority of M/T neurons compromises the cell biology of the remaining ones.

      DT is an extremely potent toxin that kills cells by inhibiting proteins translation, and it has been demonstrated that the presence of a single DT molecule in a cell is sufficient to kill it, because of its highly efficient catalytic activity. Accordingly, previous experiments have shown that DT kills cells within a few hours after its appearance in the cytoplasm (Yamaizumi et al., 1978). In other words, all the published evidence suggests that if a cell is exposed to the action of DT, that cell will die shortly. There is no evidence that cells exposed to DT can survive and experience long-term effects. Finally, previous works have not observed any long-term changes in neurons directly caused by the actions of DT (Johnson et al., 2017).

      (2) The authors interpret the growth of ectopic dendrites later in life as a lack of maintenance of dendrite structure; however, maybe the observed changes reflect actually adaptations that optimize wiring for extremely low numbers of M/T neurons. The finding that olfactory behavior was less affected than predicted supports this interpretation.

      We do not know the cellular or molecular mechanisms that explain why reducing the density of M/T cells is followed by the growth of ectopic dendrites from the remaining M/T cells. We agree that the functional outcome of growing ectopic dendrites may result in an optimization of wiring in the bulb and could explain why olfactory function is relatively preserved. We now include a comment regarding this possibility (Lines 513-525).   

      (3) The number of remaining M/T neurons is much higher at P10 than later. Can the relatively large number of remaining neurons (or their better health status) be the reason that dendrites refine normally at the early developmental stages rather than a (currently unknown) developmental capacity that preserves refinement?

      We thank the reviewer for the suggestion, which was also raised by reviewer 1. 

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (4) While the effect of reduced M/T neuron density on both M/T dendrites and OSN axons is described well, the relationship between both needs to be characterized better: Is one effect preceding the other or do they occur simultaneously? Can one be the consequence of the other?

      Previous works have demonstrated that disrupting the topographic projection of the OSN axons has no effect on the structure of the apical dendrite of M/T cells (Ma et al., 2014; Nishizumi et al., 2019). Our experiments ablating a large fraction of M/T cells suggest that they are necessary for the correct targeting of OSN axons into the bulb. However, our experiments do not allow us to tell apart these 2 scenarios: 

      (a) the ablation of a large fraction of M/T cells directly causes the sprouting of the apical dendrite of M/T cells, and that this sprouting in turn causes the abnormal projection of OSN axons onto the bulb. 

      (b) the ablation of a large fraction of M/T cells first causes the axons of OSN to project abnormally onto multiple glomeruli in the bulb, and this in turn causes the dendrite of remaining M/T cells to sprout onto multiple glomeruli. 

      We now include a comment on the manuscript explaining this point. (Lines 473-492)

      (5) Page 7: the observation that not all neurons develop additional dendrites is not a sign of differences between cell types, it may be purely stochastic.

      This is correct, and we mention these 2 scenarios in the discussion (Line 407-408). 

      (6) Page 8: the fact that activity blockade did not affect the formation of ectopic dendrites does not suggest that the process is not activity-dependent: both manipulations have the same effect and may just mask each other.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (7) It remains unclear how the observed structural changes can explain the behavioral effects.

      We agree that the relationship between structural changes and behavior was not appropriately explained in our manuscript. Our manipulations cause two major changes in the olfactory system, one primary, and several secondary. The primary change is a large reduction in the number of M/T cells both in the MOB and AOB. This reduction in M/T cell number triggers significant secondary changes in the connectivity of the bulb, including an abnormal projection of OSNs onto the OB, and the growth of ectopic dendrites from the remaining M/T cells into multiple glomeruli.

      The behavioral abnormalities displayed by these mice is ultimately caused by the reduction in the number of M/T cells, but it is likely that the secondary structural changes could regulate some of the behavioral phenomena that we observed. For example, in principle, it is possible that the ectopic dendrites innervating several glomeruli could help the bulb to perceive smells with a much reduced number of M/T cells. On the other hand, this promiscuous growth of dendrites into multiple glomeruli could make it more difficult for the animals to discriminate between smells. The same argument could be made about the fact that OSN axons project onto multiple glomeruli: we simply do not know if this change helps or makes it more difficult for the animal to detect smells.  

      We now include a comment regarding this issue (Lines 513-525).   

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments and a more thorough discussion of the results, as suggested in the public review, would significantly strengthen the paper. Below are some specific parts that need to be addressed.

      There is a lack of information on how M/T cell numbers are quantified. Without the information, it is difficult to evaluate the claim. Using the tdTomato signal may miss cells that are not labeled due to the transgenic effect. 

      Although we cannot conclude that we are identifying the complete set of M/T cells (because the transgenic lines may fail to label some M/T cells), the number of M/T cells that we observed is similar to that previously reported (Richard et al., 2010). This concern has been included in the Results section (Lines 121-124).

      A more detailed description about M/T cells quantification has been added into the method section (Lines 627-632).

      There is a lack of information on the timeline of treatment and how measurement of the olfactory bulb volume is conducted.

      We now include a more detailed description of how the volume of the OB was measured in the methods (Lines 621-623).

      The volume measurement is inconsistent with the pictures shown. In Figure 1, supplemental data 2 panels B and C, it appears that the bulbs in DTA and DTR mice are about half in length in each dimension. This would translate into ~1/8 of the volume of the control mice.

      We measured the volume of the bulbs based on the Neurolucida reconstructions, and we observed that in both DTA and iDTR mice the volumes of their bulbs are roughly 50% compared to a wild type mouse. In Figure 1 - figure supplement 2 the sections that were shown for wild type, DTA and iDTR mice were not taken at the same position in the bulb, and this gave the impression that the bulbs from DTA and iDTR were much smaller than they really are. We now show sections for these three animals at equivalent positions in the bulb. 

      Figure 1 E and F have no legend.

      We apologize for this mistake - we have now added the legend for Figures 1E and F (Lines 1009-1013).

      Figure 3, supplemental data 2, it is not clear what the readers should be looking at. The data is confusing even for experts in the field. The authors should describe the figures more clearly, pointing out what they are supposed to show.

      We apologize for this, and we have now added a more detailed description of Figure3 – figure supplement 2 (Lines 1153-1167).

      In several figures, it is not clearly written what the comparisons were for where there are indications of statistical significance above the bars.

      We have now included a more detailed description of the statistics comparison in the figure legends.

      AAV serotype should be specified.

      The AAV serotype used to label M/T cells was the AAV-PHP.eB. We have added this information in the methods section of the manuscript. 

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      Page 5, para 2: "The decrease in neuronal plasticity with age": it is unclear what "the decrease" refers to.

      We have changed this sentence in the text to make it clear:

      “The decrease in structural plasticity of M/T cells after apical dendrite refinement (Mizrahi and Katz, 2003),….”

      Line 146-148

      Is there a quantification of the effect of Kir2.1 overexpression alone (example shown in Figure 3D)?

      We did an experiment in IDTR animals in which a fraction of M/T cells expressed Kir2.1, and we split these animals in 2 groups: (a) animals that received an injection of DT, and (b) animals that did not receive any DT. We quantified the effect of Kir2.1 on M/T cells from animals that received DT injection (with an ablation of around of 90% of M/T cells) and we did not observe any clear statistically significant differences between cells expressing Kir2.1 or neurons that did not express Kir2.1 from other iDTR animals that also received DT injections. We did not quantify the possible effects of kir2.1 in the group of animals that did not receive DT because on a first inspection we did not observe any clear differences between Kir2.1 cells and neighboring wild type cells. 

      References

      Fujimoto S, Leiwe MN, Aihara S, Sakaguchi R, Muroyama Y, Kobayakawa R, Kobayakawa K, Saito T, Imai T. 2023. Activity-dependent local protection and lateral inhibition control synaptic competition in developing mitral cells in mice. Dev Cell S1534-5807(23)00237-X. doi:10.1016/j.devcel.2023.05.004

      Johnson RE, Tien N-W, Shen N, Pearson JT, Soto F, Kerschensteiner D. 2017. Homeostatic plasticity shapes the visual system’s first synapse. Nat Commun 8:1220. doi:10.1038/s41467-017-01332-7

      Lin DM, Wang F, Lowe G, Gold GH, Axel R, Ngai J, Brunet L. 2000. Formation of precise connections in the olfactory bulb occurs in the absence of odorant-evoked neuronal activity. Neuron 26:69–80. doi:10.1016/s0896-6273(00)81139-3

      Ma L, Wu Y, Qiu Q, Scheerer H, Moran A, Yu CR. 2014. A developmental switch of axon targeting in the continuously regenerating mouse olfactory system. Science 344:194–197. doi:10.1126/science.1248805

      Nishizumi H, Miyashita A, Inoue N, Inokuchi K, Aoki M, Sakano H. 2019. Primary dendrites of mitral cells synapse unto neighboring glomeruli independent of their odorant receptor identity. Commun Biol 2:1–12. doi:10.1038/s42003-018-0252-y

      Richard MB, Taylor SR, Greer CA. 2010. Age-induced disruption of selective olfactory bulb synaptic circuits. Proc Natl Acad Sci U S A 107:15613–15618. doi:10.1073/pnas.1007931107

      Yamaizumi M, Mekada E, Uchida T, Okada Y. 1978. One molecule of diphtheria toxin fragment A introduced into a cell can kill the cell. Cell 15:245–250. doi:10.1016/0092-8674(78)90099-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, "Nicotine enhances the stemness and tumorigenicity in intestinal stem cells via Hippo-YAP/TAZ and Notch signal pathway", authors Isotani et al claimed that this study identifies a NIC-triggered pathway regulating the stemness and tumorigenicity of ISCs and suggest the use of DBZ as a potential therapeutic strategy for treating intestinal tumors. However, the presented data do not support the primary claims.

      Weaknesses:

      My main reservation is that the quality of the results presented in the manuscript may not fully substantiate their conclusions. For instance, in Figure 2 A and B, it is challenging to discern a healthy organoid. This is significant, as the entirety of Figure 2 and several panels in Figures 3 - 5 are based on these organoid assays. Additionally, there seems to be a discrepancy in the quality of results from the western blot, as the lanes of actin do not align with other proteins (Figure 6B).

      We directly count organoids under microscopy as described previously (Igarashi M et.al., Cell.2016 Igarashi M et.al., Aging Cell.2019). When we count the number of organoids, we exactly can discern which are alive or dead organoids under microscope. Hence, we will detail the method and show which are alive or dead organoids using arrows in our revised version (Figure2A and B).

      Moreover, as reviewer1 pointed out, the number of organoids originated from intestinal or colonic crypts can be affected by dead organoids as in Figure2A and 2B. However, almost all colonies from isolated intestinal stem cells (ISCs) (Figure 2C and D) are alive, so the number of colonies are less affected by dead colonies in those experiments using isolated ISCs. Since all organoid data in Figure 3-5 are based on the same method as that of Figure2C and D, the data quality of Figures 3-5 cannot be affected by dead colonies.

      Finally, to improve data quality of Figure6B, we repeated this experiments and replaced it by new figures.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Isotani et al characterizes the hyperproliferation of intestinal stem cells (ISCs) induced by nicotine treatment in vivo. Employing a range of small molecule inhibitors, the authors systematically investigated potential receptors and downstream pathways associated with nicotine-induced phenotypes through in vitro organoid experiments. Notably, the study specifically highlights a signaling cascade involving α7-nAChR/PKC/YAP/TAZ/Notch as a key driver of nicotine-induced stem cell hyperproliferation. Utilizing a Lgr5CreER Apcfl/fl mouse model, the authors extend their findings to propose a potential role of nicotine in stem cell tumorgenesis. The study posits that Notch signaling is essential during this process.

      Strengths and Weaknesses:

      One noteworthy research highlight in this study is the indication, as shown in Figure 2 and S2, that the trophic effect of nicotine on ISC expansion is independent of Paneth cells. In the Discussion section, the authors propose that this independence may be attributed to distinct expression patterns of nAChRs in different cell types. To further substantiate these findings, it is suggested that the authors perform tissue staining of various nAChRs in the small intestine and colon. This additional analysis would provide more conclusive evidence regarding how stem cells uniquely respond to nicotine. It is also recommended to present the staining of α7-nAChR from different intestinal regions. This will provide insights into the primary target sites of nicotine in the gut tract. Additionally, it is recommended that the authors consider rephrasing the conclusion in this section (lines 123-124). The current statement implies that nicotine does not affect Paneth cells, which may be inaccurate based on the suggestion in line 275 that nicotine might influence Paneth cells through α2β4-nAChR. Providing a more nuanced conclusion would better reflect the complexity of nicotine's potential impact on Paneth cells.

      It was difficult to obtain nAchRs antibodies usable in immunostaining. Hence, we instead performed qPCR of nAchRs in ISCs and Paneth cells from isolated whole small intestine (new Figure3C), although we cannot know the difference of the nAchRs expression in different intestinal regions by this method. Although the comparatively high expression was observed in α7-nAChR and α8nAChR in both ISCs and Paneth cells, the significant difference between ISCs and Paneth cells were not observed (Figure3C). 

      Interestingly, nicotine up-regulated only the expression of α7-nAChR in ISCs, suggesting the specifical response of α7-nAChR to nicotine (Figures 3C and D). We paraphrased the conclusion of the paragraph according to reviewer’s suggestion.

      As shown in the same result section, the effect of nicotine on ISC organoid formation appears to be independent of CHIR99021, a Wnt activator. Despite this, the authors suggest a potential involvement of Wnt/β-catenin activation downstream of nicotine in Figure 4F. In the Lgr5CreER Apcfl/fl mouse model, it is known that APC loss results in a constitutive stabilization of β-catenin, thus the hyperproliferation of ISCs by nicotine treatment in this mouse model is likely beyond Wnt activation. Therefore, it is recommended that the authors reconsider the inclusion of Wnt/β-catenin as a crucial signaling pathway downstream of nicotine, given the experimental evidence provided in this study.

      We appreciate for this important suggestion. Certainly, Wnt/β-catenin was activated in Nicotine treated ISCs. However, as reviewer points out, the hyperproliferation of ISCs by nicotine treatment is likely beyond Wnt activation.  According to the reviewer’s suggestion, we removed Wnt/β-catenin as a crucial signaling pathway downstream of nicotine (Figure 5G).

      In Figure 4, the authors investigate ISC organoid formation with a panPKC inhibitor, revealing that PKC inhibition blocks nicotine-induced ISC expansion. It's noteworthy that PKC inhibitors have historically been used successfully to isolate and maintain stem cells by promoting self-renewal. Therefore, it is surprising to observe no effect or reversal effect on ISCs in this context. A previous study demonstrated that the loss of PKCζ leads to increased ISC activity both in vivo and in vitro (DOI: 10.1016/j.celrep.2015.01.007). Additionally, to strengthen this aspect of the study, it would be beneficial for the authors to present more evidence, possibly using different PKC inhibitors, to reproduce the observed results with Gö 6983. This could help address potential concerns or discrepancies and contribute to a more comprehensive understanding of the role of PKC in nicotine-induced ISC expansion.

      Gö 6983 is a pan-PKC inhibitor against for PKCα, PKCβ, PKCγ, PKCδ and PKCζ with IC50 of 7 nM, 7 nM, 6 nM, 10 nM and 60 nM, respectively. Since we used Gö 6983 at the concentration of 10nM in our experiment, we consider PKCζ may not be possible target of nicotine. Additionally, we treated using 5nM Sotrastaurin, another pan-PKC inhibitor, which is supposed not to affect PKCζ. The observed result with Gö 6983 was reproduced by Sotrastaurin (Supplemental Figure 3E).

      An additional avenue that could enhance the clinical relevance of the study is the exploration of human datasets. Specifically, leveraging scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1) could provide valuable insights. Analyzing the expression patterns of nAChRs across diverse regions and cell types in the human intestine may offer a potential clinical implication.

      We analyzed distribution pattern nAChRs of by scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1). In consistent with mouse data (Figure3C), the expression of human α7-nAChR is higher than that of other nAChRs. The difference of the expression between ISCs and Paneth cells is not clear as in that of mouse (Supplemental Figure4A and B). From mouse and human data, we speculate the induction of specific nAChR by nicotine is essence of ISC response to nicotine, rather than the distribution of nAChRs.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could benefit from addressing a few minor points to enhance its quality before publication:

      (1) Ensure all images are presented in higher resolution to improve visual clarity.

      We replaced all images by those with higher resolution.

      (2) Quantify Western blot results accurately for rigor and precision in data representation.

      We quantified all blots.

      (3) Include error bars in control groups where missing, particularly in Figures 3C and 4D, to enhance data interpretation.

      We included error bars in control groups in new Figure 3C and 4D.

      (4) The layout of Figure S3B, S4A and S4B should be corrected.

      We corrected the layout of those Figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Petty and Bruno investigate how response characteristics in the higher-order thalamic nuclei POm (typically somatosensory) and LP (typically visual) change when a stimulus (whisker air puff or visual drifting grating) of one or the other modality is conditioned to a reward. Using a two-step training procedure, they developed an elegant paradigm, where the distractor stimulus is completely uninformative about the reward, which is reflected in the licking behavior of trained mice. While the animals seem to take on to the tactile stimulus more readily, they can also associate the reward with the visual stimulus, ignoring tactile stimuli. In trained mice, the authors recorded single-unit responses in both POm and LP while presenting the same stimuli. The authors first focused on POm recordings, finding that in animals with tactile conditioning POm units specifically responded to the air puff stimulus but not the visual grating. Unexpectedly, in visually conditioned animals, POm units also responded to the visual grating, suggesting that the responses are not modality-specific but more related to behavioral relevance. These effects seem not be homogeneously distributed across POm, whereas lateral units maintain tactile specificity and medial units respond more flexibly. The authors further ask if the unexpected cross-modal responses might result from behavioral activity signatures. By regressing behavior-coupled activity out of the responses, they show that late activity indeed can be related to whisking, licking, and pupil size measures. However, cross-modal short latency responses are not clearly related to animal behavior. Finally, LP neurons also seem to change their modality-specificity dependent on conditioning, whereas tactile responses are attenuated in LP if the animal is conditioned to visual stimuli.

      The authors make a compelling case that POm neurons are less modality-specific than typically assumed. The training paradigm, employed methods, and analyses are mostly to the point, well supporting the conclusions. The findings importantly widen our understanding of higher-order thalamus processing features with the flexibility to encode multiple modalities and behavioral relevance. The results raise many important questions on the brain-wide representation of conditioned stimuli. E.g. how specific are the responses to the conditioned stimuli? Are thalamic cross-modal neurons recruited for the specific conditioned stimulus or do their responses reflect a more global shift of attention from one modality to another? 

      To elaborate on higher-order thalamic activity in relationship to conditioned behavior, a trialby-trial analysis would be very useful. Is neuronal activity predictive of licking and at which relative timing? 

      To elaborate on the relationship between neuronal activity and licking, we have created a new supplementary figure (Figure S1), where we present the lick latency of each mouse on the day of recording. We also perform more in-depth analysis of neural activity that occurs before lick onset, which is presented in a new main figure (new Figure 4). 

      Furthermore, I wonder why the (in my mind) major and from the data obvious take-away, "POm neurons respond more strongly to visual stimuli if visually conditioned", is not directly tested in the summary statistics in Figure 3h.

      We have added a summary statistic to Figure 3h and to the Results section (lines 156-157) comparing the drifting grating responses in visually and tactilely conditioned mice.  

      The remaining early visual responses in POm in visually conditioned mice after removing behavior-linked activity are very convincing (Figure 5d). It would help, however, to see a representation of this on a single-neuron basis side-by-side. Are individual neurons just coupled to behavior while others are independent, or is behaviorally coupled activity a homogeneous effect on all neurons on top of sensory activity?

      In lieu of a new figure, we have performed a new analysis of individual neurons to classify them as “stimulus tuned” and/or “movement tuned.” We find that nearly all POm cells encode movement and arousal regardless of whether they also respond to stimuli. This is presented in the Results under the heading “POm correlates with arousal and movement regardless of conditioning” (Lines 219-231).

      The conclusions on flexible response characteristics in LP in general are less strongly supported than those in POm. First, the differentiation between POm and LP relies heavily on the histological alignment of labeled probe depth and recording channel, possibly allowing for wrong assignment. 

      We appreciate the importance in differentiating between POm, LP, and surrounding regions to accurately assign a putative cell to a brain region. The method we employed (aligning an electrode track to a common reference atlas) is widely used in rodent neuroscience, especially in regions like POm and LP which are difficult to differentiate molecularly (for example, see Sibille, Nature Communications, 2022; and Schröder, Neuron, 2020). 

      Furthermore, it seems surprising, but is not discussed, that putative LP neurons have such strong responses to the air puff stimuli, in both conditioning cases. In tactile conditioning, LP air puff responses seem to be even faster and stronger than POm. In visual conditioning, drifting grating responses paradoxically seem to be later than in tactile conditioning (Fig S2e). These differences in response changes between POm and LP should be discussed in more detail and statements of "similar phenomena" in POm and LP (abstract) should be qualified.  

      We have further developed our analysis and discussion of LP activity. Our analysis of LP stimulus response latencies are now presented in greater detail in Figure S3, and we have expanded the results section accordingly (lines 266-275). We have also expanded the discussion section to both address these new analyses and speculate on what might drive these surprising “tactile responses” in LP.

      Reviewer #2 (Public Review): 

      Summary  

      This manuscript by Petty and Bruno delves into the still poorly understood role of higherorder thalamic nuclei in the encoding of sensory information by examining the activity in the Pom and LP cells in mice performing an associative learning task. They developed an elegant paradigm in which they conditioned head-fixed mice to attend to a stimulus of one sensory modality (visual or tactile) and ignore a second stimulus of the other modality. They recorded simultaneously from POm and LP, using 64-channel electrode arrays, to reveal the contextdependency of the firing activity of cells in higher-order thalamic nuclei. They concluded that behavioral training reshapes activity in these secondary thalamic nuclei. I have no major concerns with the manuscript's conclusions, but some important methodological details are lacking and I feel the manuscript could be improved with the following revisions.

      Strengths 

      The authors developed an original and elegant paradigm in which they conditioned headfixed mice to attend to a stimulus of one sensory modality, either visual or tactile, and ignore a second stimulus of the other modality. As a tactile stimulus, they applied gentle air puffs on the distal part of the vibrissae, ensuring that the stimulus was innocuous and therefore none aversive which is crucial in their study. 

      It is commonly viewed that the first-order thalamus performs filtering and re-encoding of the sensory flow; in contrast, the computations taking place in high-order nuclei are poorly understood. They may contribute to cognitive functions. By integrating top-down control, high-order nuclei may participate in generating updated models of the environment based on sensory activity; how this can take place is a key question that Petty and Bruno addressed in the present study.

      Weaknesses  

      (1) Overall, methods, results, and discussion, involving sensory responses, especially for the Pom, are confusing. I have the feeling that throughout the manuscript, the authors are dealing with the sensory and non-sensory aspects of the modulation of the firing activity in the Pom and LP, without a clear definition of what they examined. Making subsections in the results, or a better naming of what is analyzed could convey the authors' message in a clearer way, e.g., baseline, stim-on, reward.  

      We thank Reviewer 2 for this suggestion. We have adjusted the language throughout the paper to more clearly state which portions of a given trial we analyzed. We now consistently refer to “baseline,” “stimulus onset,” and “stimulus offset” periods. 

      In line #502 in Methods, the authors defined "Sensory Responses. We examined each cell's putative sensory response by comparing its firing rate during a "stimulus period" to its baseline firing rate. We first excluded overlapping stimuli, defined as any stimulus occurring within 6 seconds of a stimulus of a different type. We then counted the number of spikes that occurred within 1 second prior to the onset of each stimulus (baseline period) and within one second of the stimulus onset (stimulus period). The period within +/-50ms of the stimulus was considered ambiguous and excluded from analysis." 

      Considering that the responses to whisker deflection, while weak and delayed, were shown to occur, when present, before 50 ms in the Pom (Diamond et al., 1992), it is not clear what the authors mean and consider as "Sensory Responses"? 

      We have addressed this important concern in three ways. First, we have reanalyzed our data to include the 50ms pre- and post-stimulus time windows that were previously excluded. This did not qualitatively change our results, but updated statistical measurements are reflected in the Results and the legends of figures 3 and 7. Second, we have created a new figure (new Figure 4) which provides a more detailed analysis of early POm stimulus responses at a finer time scale. Third, we have amended the language throughout the paper to refer to “stimulus responses” rather than “sensory responses” to reflect how we cannot disambiguate between bottom-up sensory input and top-down input into POm and LP with our experimental setup. We refer only to “putative sensory responses” when discussing lowlatency (<100ms) stimulus responses.

      Precise wording may help to clarify the message. For instance, line #134: "Of cells from tactilely conditioned mice, 175 (50.4%) significantly responded to the air puff, as defined by having a firing rate significantly different from baseline within one second from air puff onset (Figure 3d, bottom)", could be written "significantly responded to the air puff" should be written "significantly increased (or modified if some decreased) their firing rate within one second after the air puff onset (baseline: ...)". This will avoid any confusion with the sensory responses per se.

      We have made this specific change suggested by the reviewer (lines 145-146) and made similar adjustments to the language throughout the manuscript to better communicate our analysis methods. 

      (2) To extend the previous concern, the latency of the modulation of the firing rate of the Pom cells for each modality and each conditioning may be an issue. This latency, given in Figure S2, is rather long, i.e. particularly late latencies for the whisker system, which is completely in favor of non-sensory "responses" per se and the authors' hypothesis that sensory-, arousal-, and movement-evoked activity in Pom are shaped by associative learning. Latency is a key point in this study. 

      Therefore, 

      - latencies should be given in the main text, and Figure S2 could be considered for a main figure, at least panels c, d, and e, could be part of Figure 3. 

      - the Figure S2b points out rather short latency responses to the air puff, at least in some cells, in addition to late ones. The manuscript would highly benefit from an analysis of both early and late latency components of the "responses" to air puffs and drafting grating in both conditions. This analysis may definitely help to clarify the authors' message. Since the authors performed unit recordings, these data are accessible.

      - it would be highly instructive to examine the latency of the modulation of Pom cells firing rate in parallel with the onset of each behavior, i.e. modification of pupil radius, whisking amplitude, lick rate (Figures 1e, g and 3a, b). The Figure 1 does not provide the latency of the licks in conditioned mice.

      - the authors mention in the discussion low-latency responses, e.g., line #299: "In both tactilely and visually conditioned mice, movement could not explain the increased firing rate at air puff onset. These low-latency responses across conditioning groups is likely due in part to "true" sensory responses driven by S1 and SpVi."; line #306: "Like POm, LP displayed varied stimulus-evoked activity that was heavily dependent on conditioning. LP responded to the air puff robustly and with low latency, despite lacking direct somatosensory inputs."  But which low-latency responses do the authors refer to? Again, this points out that a robust analysis of these latencies is missing in the manuscript but would be helpful to conclude.

      We have moved our analysis of stimulus response latency in POm to new Figure 4 in the main text and have expanded both the Results and Discussion sections accordingly. We have also analyzed the lick latency on the day of recording, included in a new supplemental Figure S1. 

      (3) Anatomical locations of recordings in the dorsal part of the thalamus. Line #122 "Our recordings covered most of the volume of POm but were clustered primarily in the anterior and medial portions of LP (Figure 2d-f). Cells that were within 50 µm of a region border were excluded from analysis." 

      How did the authors distinguish the anterior boundary of the LP with the LD nucleus just more anterior to the LP, another higher-order nucleus, where whisker-responsive cells have been isolated (Bezdudnaya and Keller, 2008)? 

      Cells within 50µm of any region boundary were excluded, including those at the border of LP and LD. We also reviewed our histology images by eye and believe that our recordings were all made posterior of LD. 

      (4) The mention in the Methods about the approval by an ethics committee is missing.  All the surgery (line #381), i.e., for the implant, the craniotomy, as well as the perfusion, are performed under isoflurane. But isoflurane induces narcosis only and not proper anesthesia. The mention of the use of analgesia is missing. 

      We thank Reviewer 2 for drawing our attention to this oversight. All experiments were conducted under the approval of the Columbia University IACUC. Mice were treated with the global analgesics buprenorphine and carprofen, the local analgesic bupivacaine, and anesthetized with isoflurane during all surgical procedures. We have amended the Methods section to include this information (Lines 458-470).

      Reviewer #3 (Public Review): 

      Petty and Bruno ask whether activity in secondary thalamic nuclei depends on the behavioral relevance of stimulus modality. They recorded from POm and LP, but the weight of the paper is skewed toward POm. They use two cohorts of mice (N=11 and 12), recorded in both nuclei using multi-electrode arrays, while being trained to lick to either a tactile stimulus (air puff against whiskers, first cohort) or a visual stimulus (drifting grating, second cohort), and ignore the respective other. They find that both nuclei, while primarily responsive to their 'home' modality, are more responsive to the relevant modality (i.e. the modality predicting reward). 

      Strengths: 

      The paper asks an important question, it is timely and is very well executed. The behavioral method using a delayed lick index (excluding impulsive responses) is well worked out. Electrophysiology methods are state-of-the-art with information about spike quality in Figure S1. The main result is novel and important, convincingly conveying the point that encoding of secondary thalamic nuclei is flexible and clearly includes aspects of the behavioral relevance of a stimulus. The paper explores the mapping of responses within POm, pointing to a complex functional structure, something that has been reported/suggested in earlier studies. 

      Weaknesses: 

      Coding: It does not become clear to which aspect of the task POm/LP is responding. There is a motor-related response (whisking, licking, pupil), which, however, after regressing it out leaves a remaining response that the authors speculate could be sensory.

      Learning: The paper talks a lot about 'learning', although it is only indirectly addressed. The authors use two differently (over-)trained mice cohorts rather than studying e.g. a rule switch in one and the same mouse, which would allow us to directly assess whether it is the same neurons that undergo rule-dependent encoding. 

      We disagree that our animals are “overtrained,” as every mouse was fully trained within 13 days. We agree that it would be interesting to study a rule-switch type experiment, but such an experiment is not necessary to reveal the profound effect that conditioning has on stimulus responses in POm and LP. 

      Mapping: The authors treat and interpret the two nuclei very much in the same vein, although there are clear differences. I would think these differences are mentioned in passing but could be discussed in more depth. Mapping using responses on electrode tracks is done in POm but not LP.

      The mapping of LP responses by anatomical location is presented in the supplemental Figure S4 (previously S3). We have expanded our discussion of LP and how it might differ from POm.

      Reviewer #1 (Recommendations For The Authors):  

      Minor writing issues: 

      122 ...67 >LP< cells?

      301 plural "are”

      We have fixed these typos.

      Figure issues

      *  3a,b time ticks are misaligned and the grey bar (bottom) seems not to align with the visual/tactile stimulus shadings.

      *  legend to Figure 3b refers to Figure 1c which is a scheme, but if 1g is meant, this mouse does not seem to have a session 12? 

      *  3c,e time ticks slightly misaligned. 

      *  5e misses shading for the relevant box plots, assuming it should be like Figure 3h.  

      We thank Reviewer 1 for pointing out these errors. We have adjusted Figures 1, 3, and 5 accordingly.

      Analyses 

      I am missing a similar summary statistics for LP as in Figure 3h 

      We have added a summary box chart of LP stimulus responses (Figure 7g), similar to that of POm in Figure 3. We have also performed similar statistical analyses, the results of which are presented in the legend for Figure 7. 

      Reviewer #2 (Recommendations For The Authors): 

      More precisions are required for the following points: 

      (1) The mention of the use of analgesia is missing and this is not a minor concern. Even if the recordings are performed 24 hours after the surgery for the craniotomy and screw insertion and several days after the main surgery for the implant, taking into account the pain of the animals during surgeries is crucial first for ethical reasons, and second because it may affect the data, especially in Pom cells: pain during surgery may induce the development of allodynia and/or hyperalgesia phenomenae and Pom responses to sensory stimuli were shown to be more robust in behavioral hyperalgesia (Masri et al., 2009).  

      We neglected to include details on the analgesics used during surgery and post-operation recovery in our original manuscript. Mice were administered buprenorphine, carprofen, and bupivacaine immediately prior to the head plate surgery and were treated with additional carprofen during recovery. Mice were similarly treated with analgesics for the craniotomy procedure. Mice were carefully observed after craniotomy, and we saw no evidence of pain or discomfort. Furthermore, mice performed the behavior at the same level pre- and postcraniotomy (now presented in Figure 1j), which also indicates that they were not in any pain. 

      (2) The head-fixed preparation is only poorly described.

      Line #414: "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes." 

      And line #425 "Mice were trained for one session per day, with each session consisting of an equal number of visual stimuli and air puffs. Sessions ranged from 20-60 minutes and about 40-120 of each stimulus. " 

      More details should be given about the head-fixation training protocol. Are 15-25 minutes the session time duration, 60 minutes, or other time duration? How long does it take to get mice well trained to the head fixation, and on which criteria?  

      Line #389: "Mice were then allowed to recover for 24 hours, after which the sealant was removed and recordings were performed. At the end of experiments,"

      The timeline is not clear: is there one day or several days of recordings? 

      We have expanded on our description of the head fixation protocol in the Methods. We describe in more detail how mice were habituated to head fixation, the timing of water restriction, and the start of conditioning/training (Habituation and Conditioning, lines 492-500).

      (4) Line #411: "Mice were deprived of water 3 days prior to the start of conditioning" followed by line #414 "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes".

      If I understood correctly, the mice were then not fully water-deprived for 3 days since they received water while head-fixed. This point may be clarified. 

      We addressed these concerns in the changes to the Methods section mentioned in the preceding point (3).

      (5) Line #157: "Modality selectivity varies with anatomical location in Pom" while the end of the previous paragraph is "This suggests that POm encoding of reward and/or licking is insensitive to task type, an observation we examine further below."

      The authors then come to anatomical concerns before coming back to what the Pom may encode in the following section. This makes the story quite confusing and hard to follow even though pretty interesting.  

      We have reordered our Figures and Results to improve the flow of the paper and remove this point of confusion. We now present results on the encoding of movement before analyzing the relationship between POm stimulus responses and anatomical location. What was old Figure 5 now precedes what was old Figure 4.

      (6) Licks Analysis. Line #99 "However, this mouse also learned that the air puff predicted a lack of reward in the shaping task, as evidenced by withholding licking upon the onset of the air puff. The mouse thus displayed a positive visual lick index and a negative tactile lick index, suggesting that it attended to both the tactile and visual stimuli (Figure 1f, middle arrow)."

      Line #105 "All visually conditioned mice exhibited a similar learning trajectory (Figure 1i left, 1j left)". 

      Interestingly, the authors revealed that mice withheld licking upon the onset of the air puff in the visual conditioning, which they did not do at the onset of the drifting grating in the tactile conditioning. This withholding was extinguished after the 8th session, which the authors interpret as the mice finally ignoring the air puff. Is this effect significant, is there a significant withholding licking upon the onset of the air puff on the 12 tested mice? 

      The withholding of licking was significant (assessed with a sign-rank test) in visually conditioned mice prior to switching to the full version of the task. Indeed, it was the abolishment of this effect after conditioning with the full version of the task that was our criterion for when a mouse was fully trained. We have elaborated on this in the Habituation and Conditioning section in the Methods.

      (1) Throughout the manuscript "Touch" is used instead of passive whisker deflection, and may be confusing with "active touch" for the whisker community readers. I recommend avoiding using "touch" instead of "passive whisker deflection".

      We appreciate that “touch” can be an ambiguous term in some contexts. However, we have limited our use of the word to refer to the percept of whisker deflection; we do not describe the air puff stimulus as a “touch.” We respectfully would like to retain the use of the word, as it is useful for comparing somatosensory stimuli to visual stimuli.

      (2) Line #395: "Air puffs (0.5-1 PSI) were delivered through a nozzle (cut p1000 pipet tip, approximately 3.5mm diameter aperture)".

      Are air puffs of <1 PSI applied, not <1 bar?  

      We thank Reviewer 3 for pointing out this inaccuracy. The air puffs were indeed between 0.5 and 1 bar, not PSI. We have addressed this in the Methods.

      (3) Line #441: "In the full task, the stimuli and reward were identical, but stimuli were presented at uncorrelated and less predictable intervals."  Do the authors mean that all stimuli are rewarded?  

      The stimuli and reward were identical between the shaping and full versions of the task. In the full version of the task, the unrewarded stimulus was truly uncorrelated with reward, rather than anticorrelated. 

      (4) Line #445 "for a mean ISI of 20 msec." ISI is not defined, I guess that it means interstimulus interval. Even if pretty obvious, to avoid any confusion for future readers, I would recommend using another acronym, especially in a manuscript about electrophysiology, since ISI is a dedicated acronym for inter-spike interval. 

      We have defined the acronym ISI as “inter-stimulus interval” when first introduced in the results (Line 82) and in the Methods (Line 511).

      (5) Line #416 "In the first phase of conditioning ("shaping"), mice were separated into two cohorts: a "tactile" cohort and a "visual" cohort. Mice were presented with tactile stimuli (a two-second air puff delivered to the distal whisker field) and visual stimuli (vertical drifting grating on a monitor). Throughout conditioning, mice were monitored via webcam to ensure that the air puff only contacted the whiskers and did not disturb the facial fur nor cause the mouse to blink, flinch, or otherwise react - ensuring the stimulus was innocuous. The stimulus types were randomly ordered. In the visual conditioning cohort, the visual stimulus was paired with a water reward (8-16µL) delivered at the time of stimulus offset. In the tactile conditioning cohort, the reward was instead paired with the offset of the air puff. Regardless of the type of conditioning, stimulus type was a balanced 50:50 with an inter-stimulus interval of 8-12 seconds (uniform distribution)." 

      The mention of the "full version of the task" will be welcome in this paragraph to clarify what the task is for the mouse in the Methods part.

      We have more clearly defined the full version of the task in a later paragraph (line 506). We believe this addresses the potential confusion caused by the original description of the conditioning paradigm. 

      (6) Line #467: "Units were assigned to the array channel on which its mean waveform was largest". 

      Should it read mean waveform "amplitude"? 

      This is correct, we have adjusted the statement accordingly. 

      (7) Line #482 "The eye camera was positioned on the right side of the face and recorded at 60 fps." Then line #487 "The trace of pupil radius over time was smoothed over 5 frames (8.3 msec).” 5 frames, with a 60fps, represent then 83 ms and not 8.3 ms.

      We have corrected this error.  

      (8) Line #121: "257 POm cells and 67 cells from 12 visually conditioned mice" 

      67 LP cells, LP is missing 

      We have corrected this error. 

      (9) Line #354: "A consistent result of attention studies in humans and nonhuman primates is the enhancement of cortical and thalamic sensory responses to an attended visual stimuli. Here, we show not just enhancement of sensory responses to stimuli within a single modality, but also across modalities. It is worth investigating further how secondary thalamus and high-order sensory cortex encode attention to stimuli outside of their respective modalities. Our surprising conclusion that the nuclei are equivalently activated by behaviorally relevant stimuli is nevertheless compatible with these previous studies."  Since higher-order thalamic nuclei are integrative centers of many cortical and subcortical inputs, they cannot be viewed simply as relay nuclei, and there is therefore no "surprising" conclusion in these results. Not surprising, but still an elegant demonstration of the contextdependent activity/responses of the Pom/LP cells. 

      We disagree. Visual stimuli activating strong POm responses and tactile stimuli activating strong LP responses - however they do it - is a surprising result. We agree that higher-order thalamic nuclei are integrative centers, but exactly what they integrate and what the integrated output means is still poorly understood.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models).

      We respectfully but strongly disagree with the reviewer’s assessment of the novelty of our work. The models referred to by the reviewer as “random intercept models … and some flexible covariate effects” seem to relate to the estimation of normative models derived cross-sectionally as developed in and adopted from previous work, not to the work presented here. To be clear, the contributions of this work are: (i) a principled methodology to make statistical predictions for individual subjects in longitudinal studies based on a novel z-diff score, (ii) an approach to transfer information large scale normative models estimated on large scale cross-sectional data to longitudinal studies (iii) an extensive theoretical analysis of the properties of this approach and (iv) empirical evaluation on an unpublished psychosis dataset. Put simply, we provide the ability to estimate within subject change in normative models which until now only provide the ability to show a subject's position in the normative range at a given timepoint. With the exception of the reference [13] cited in the main text, we are not aware of any methods available that can achieve this. Based on this feedback combined with the feedback of the Reviewer 2, we now improved our introduction and clearly state our contribution right from the outset of the manuscript whilst also shortening the introduction to make it more concise. In this work, we are trying to be very transparent in showing to the reader that our method builds on a previously peer-reviewed model.

      The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data.

      We now provide an extensive theoretical analysis of our approach (section 2.1.3), where we show that this assumption is actually not strictly necessary and that our approach yields valid inferences even under much milder assumptions. More specifically, we first provide a mathematical grounding for the assumption we made in the initial submission, then generalise our method to a wider class of residual processes and show that our original assumption of constant quantiles is not too restrictive. We also provide a simulation study to show how the practitioner can evaluate the validity and implications of this assumption on a case-by-case basis. This generalisation is described in depth in section 2.1.3.

      The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines.

      We understand that the observed normalisation effects might appear surprising. As we outlined in our provisional response, we would like to emphasise that there is increasing evidence that the old neurodegenerative view of psychosis is an oversimplification and that trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode. More specifically, we have shown in an independent sample and with different methodology that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v2, now accepted in Schizophrenia Bulletin). These results are well-aligned with the results we show in this manuscript. We now added remarks on this topic into the discussion. We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control, which we have reported as transparently as possible. The confidence that the results are not ‘driven by some artifact of the data modeling/imaging pipelines’ is also supported by the fact that analysis of a group of healthy controls did not show any significant z-diffs (see Discussion section), neither frontally nor elsewhere. If the reviewer believes there are additional quality control checks that would further increase confidence in our findings, we would welcome the reviewer to provide specific details.

      The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.

      Indeed, we do not describe the cross-sectional population used for training the models, as these models were already trained and published with in-depth description of the datasets used for the training (https://elifesciences.org/articles/72904). We now make this more explicit in the section 2.1.1. of the manuscript (page 7), and also more explicitly acknowledge the possibility of ascertainment bias in the simulation section 2.1.4. However, we would like to emphasise that such ascertainment bias is not in any way specific to the analyses we report. In fact it is present in all studies that utilise large scale cohorts such as UK Biobank. Indeed, we are currently working on another manuscript to address this question in detail, but given the complexity of this problem and the fact that many publicly available legacy studies simply do not record sufficient demographic information, e.g. to assess racial bias properly, we believe that this is beyond the scope of the current work.

      Reviewer #2 (Public Review):

      The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.

      As noted above in our response to Reviewer 1, we significantly pruned the introduction, stating our objective in the first paragraph and elaborating on the topic later in the text. We hope that it is now less repetitive and easier to follow.

      There are no simulation studies to evaluate whether the adjustment of the crosssectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.

      This comment encouraged us to zoom out from our original assumption and generalise our method to a wider class of residual processes (stationary Gaussian processes) in section 2.1.3. We now present a theoretical analysis of our model to show that our original assumption (of stable quantiles plus noise) is actually not necessary for valid inference in our method, which broadens the applicability of our method. Of course, we also discuss in what way the original assumption is restrictive and how it aligns with the more general dynamics. We also include a simulation study to evaluate the method's performance and elucidate the role of the more general dynamics in section 2.1.4.

      The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.

      We added the mention of the difference between z-score and z-diff score into the last paragraph of introduction.

      Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.

      We now added an interpretation of the z-score in the original model below equation 7.

      It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.

      This was a very useful observation, we unified the notation and now only use variance.

      The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.

      Indeed, while describing the original model we had to make choices about how to condense the necessary information from the original model so that we can build upon it. As the phi function is only used for data transformation in the original model, we did not further elaborate on it, however, we now refer to the specific section of the original paper of Fraza et al. 2021 where it is described more in detail (https://www.sciencedirect.com/science/article/pii/S1053811921009873).

      What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.

      We corrected the formatting.

      What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.

      We added a more detailed description of the adaptation after equation 15.

      "(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.

      We now changed the formulation to be less confusing and also explicitly clarified the caveat regarding the difference of z-scores.

      One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.

      We agree with the outlined limitation in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to our approach. This effect is generally independent of the lifespan, but may further interact with the typical lifespan of disease. rWhen the z scores are taken in the context of the cross-sectional normative models, it does make it possible to identify what the overall trend of an illness is across the lifespan, and individual patient’s z-diffs not in line (with what would this typical group trajectory predicts) may e.g. correspond to early/late onset of their individual atrophy. We now make these considerations explicitly in the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      Other minor suggestions to help improve the text:...

      We thank Reviewer #2 for the list of minor suggestions to improve the text, which we all implemented in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for reviewing our manuscript and giving us the opportunity to respond and improve our paper. In our revision, we have strived to address the points raised in the comments, and implement suggested changes where feasible. We have also improved our package and created an analysis guide (available on our Github - https://github.com/gloewing/fastFMM and https://github.com/gloewing/photometry_fGLMM), showing users how to apply our methods and interpret their results. Below, we provide a detailed point-by-point response to the reviewers.

      Reviewer #1:

      Summary:

      Fiber photometry has become a very popular tool in recording neuronal activity in freely behaving animals. Despite the number of papers published with the method, as the authors rightly note, there are currently no standardized ways to analyze the data produced. Moreover, most of the data analyses confine to simple measurements of averaged activity and by doing so, erase valuable information encoded in the data. The authors offer an approach based on functional linear mixed modeling, where beyond changes in overall activity various functions of the data can also be analyzed. More in-depth analysis, more variables taken into account, and better statistical power all lead to higher quality science.

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Thank you for your favorable and detailed description of our work!

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for these important suggestions. We agree that many data pre-processing steps will influence the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we would argue that the sensitivity of analysis results to pre-processing choices should motivate the development of statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. For example, even without many standard pre-processing steps, FLMM provides smooth estimation results across trial timepoints (i.e., the “functional domain”), has the ability to adjust for betweentrial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. We appreciate the reviewer’s suggestion to emphasize and further elaborate on our method from this perspective. We have now included the following in the Discussion section:

      “FLMM can help model signal components unrelated to the scientific question of interest, and provides a systematic framework to quantify the additional uncertainty from those modeling choices. For example, analysts sometimes normalize data with trial-specific baselines because longitudinal experiments can induce correlation patterns across trials that standard techniques (e.g., repeated measures ANOVA) may not adequately account for. Even without many standard data pre-processing steps, FLMM provides smooth estimation results across trial time-points (the “functional domain”), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference approach that quantifies the resulting uncertainty. For instance, session-to-session variability in signal magnitudes or dynamics (e.g., a decreasing baseline within-session from bleaching or satiation) could be accounted for, at least in part, through the inclusion of trial-level fixed or random effects. Similarly, signal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects. Inclusion of these effects would then influence the width of the confidence intervals. By expressing one’s “beliefs” in an FLMM model specification, one can compare models (e.g., with AIC). Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences.”

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution.

      By modeling trial signals as “functions”, the method accounts for and exploits correlation across trial timepoints and, as such, any pre-smoothing of the signals should not negatively affect the validity of the 95% CI coverage. It will, however, change inferential results and the interpretation of the data, but this is not unique to FLMM, or many other statistical procedures.

      The same question applies if the z-score is calculated based on various responses or even baselines. How reliable the method is if the data are non-stationery and the baselines undergo major changes between separate trials?

      Adjustment for trial-to-trial variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of trial-level random effects. This heterogeneity would then influence the width of the confidence intervals, directly conveying the effect of the variability on the conclusions being drawn from the data. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences. Indeed, non-stationarity (e.g., a decreasing baseline within-session) due to, for example, measurement artifacts (e.g., bleaching) or behavioral causes (e.g., satiation, learning) should, if possible, be accounted for in the model. As mentioned above, one can often achieve the same goals that motivate pre-processing steps by instead applying specific FLMM models (e.g., that include trial-specific intercepts to reflect changes in baseline) to the unprocessed data. One can then compare model criteria in an objective fashion (e.g., with AIC) and quantify the uncertainty associated with those modeling choices. Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper’s logic, non-linear analysis can capture more information that is diluted by linear methods.

      This is a good question that we imagine many readers will be curious about as well. We have added in notes to the Discussion and Methods Section 4.3 to address this (copied below). We thank the reviewer for raising this point, as your feedback also motivated us to discuss this point in Part 5 of our Analysis Guide.

      Methods

      “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”

      Discussion

      “In this paper, we specified FLMM models with linear covariate–signal relationships at a fixed trial time-point across trials/sessions, to compare the FLMM analogue of the analyses conducted in (Jeong et al., 2022). However, our package allows modeling of covariate–signal relationships with non-linear functions of covariates, using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models, especially since FLMM is designed for statistical inference.”

      Reviewer #2:

      Summary:

      This work describes a statistical framework that combines functional linear mixed modeling with joint 95% confidence intervals, which improves statistical power and provides less conservative statistical inferences than in previous studies. As recently reviewed by Simpson et al. (2023), linear regression analysis has been used extensively to analyze time series signals from a wide range of neuroscience recording techniques, with recent studies applying them to photometry data. The novelty of this study lies in 1) the introduction of joint 95% confidence intervals for statistical testing of functional mixed models with nested random-effects, and 2) providing an open-source R package implementing this framework. This study also highlights how summary statistics as opposed to trial-by-trial analysis can obscure or even change the direction of statistical results by reanalyzing two other studies.

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      We appreciate the in-depth description of our work and, in particular, the R package. This is an area where we put a lot of effort, since our group is very concerned with the practical experience of users.

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial. As described by the authors, fitting pointwise linear mixed models and performing t-test and BenjaminiHochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      Thank you for making this important point. We agree that this offers an opportunity to showcase the advantages of FLMM over non-functional data analysis methods, such as the approach applied in Lee et al. (2019). As mentioned in the text, fitting entirely separate models at each trial timepoint (without smoothing regression coefficient point and variance estimates across timepoints), and applying multiple comparisons corrections as a function of the number of time points has substantial conceptual drawbacks. To see why, consider that applying this strategy with two different sub-sampling rates requires adjustment for different numbers of comparisons, and could thus lead to very different proportions of timepoints achieving statistical significance. In light of your comments, we decided that it would be useful to provide a demonstration of this. To that effect, we have added Appendix Section 2 comparing FLMM with the method in Lee et al. (2019) on a real dataset, and show that FLMM yields far less conservative and more stable inference across different sub-sampling rates. We conducted this comparison on the delay-length experiment (shown in Figure 6) data, sub-sampled at evenly spaced intervals at a range of sampling rates. We fit either a collection of separate linear mixed models (LMM) followed by a Benjamini–Hochberg (BH) correction, or FLMM with statistical significance determined with both Pointwise and Joint 95% CIs. As shown in Appendix Tables 1-2, the proportion of timepoints at which effects are statistically significant with FLMM Joint CIs is fairly stable across sampling rates. In contrast, the percentage is highly inconsistent with the BH approach and is often highly conservative. This illustrates a core advantage of functional data analysis methods: borrowing strength across trial timepoints (i.e., the functional domain), can improve estimation efficiency and lower sensitivity to how the data is sub-sampled. A multiple comparisons correction may, however, yield stable results if one first smooths both regression coefficient point and variance estimates. Because this includes smoothing the coefficient point and variance estimates, this approach would essentially constitute a functional mixed model estimation strategy that uses multiple comparisons correction instead of a joint CI. We have now added in a description of this experiment in Section 2.4 (copied below).

      “We further analyze this dataset in Appendix Section 2, to compare FLMM with the approach applied in Lee et al. (2019) of fitting pointwise LMMs (without any smoothing) and applying a Benjamini–Hochberg (BH) correction. Our hypothesis was that the Lee et al. (2019) approach would yield substantially different analysis results, depending on the sampling rate of the signal data (since the number of tests being corrected for is determined by the sampling rate). The proportion of timepoints at which effects are deemed statistically significant by FLMM joint 95% CIs is fairly stable across sampling rates. In contrast, that proportion is both inconsistent and often low (i.e., highly conservative) across sampling rates with the Lee et al. (2019) approach. These results illustrate the advantages of modeling a trial signal as a function, and conducting estimation and inference in a manner that uses information across the entire trial.”

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      Thank you for bringing this up, as we endeavored to create code that is able to scale to complex models and large datasets. We agree that highlighting this capability in the paper will strengthen the work. We now state in the Discussion section that “[T]he package is fast and maintains a low memory footprint even for complex models (see Section 4.6 for an example) and relatively large datasets.” Methods Section 4.6 now includes the following:

      Our fastFMM package scales to the dataset sizes and model specifications common in photometry. The majority of the analyses presented in the Results Section (Section 2) included fairly simple functional fixed and random effect model specifications because we were implementing the FLMM versions of the summary measure analyses presented in Jeong et al. (2022). However, we fit the following FLMM to demonstrate the scalability of our method with more complex model specifications:

      We use the same notation as the Reward Number model in Section 4.5.2, with the additional variable TL_i,j,l_ denoting the Total Licks on trial j of session l for animal i. In a dataset with over 3,200 total trials (pooled across animals), this model took ∼1.2 min to fit on a MacBook Pro with an Apple M1 Max chip with 64GB of RAM. Model fitting had a low memory footprint. This can be fit with the code:

      model_fit = fui(photometry ~ session + trial + iri + lick_time + licks + (session + trial + iri + lick_time + licks | id), parallel = TRUE, data = photometry_data)

      This provides a simple illustration of the scalability of our method. The code (including timing) for this demonstration is now included on our Github repository.

      Reviewer #3:

      Summary:

      Loewinger et al., extend a previously described framework (Cui et al., 2021) to provide new methods for statistical analysis of fiber photometry data. The methodology combines functional regression with linear mixed models, allowing inference on complex study designs that are common in photometry studies. To demonstrate its utility, they reanalyze datasets from two recent fiber photometry studies into mesolimbic dopamine. Then, through simulation, they demonstrate the superiority of their approach compared to other common methods.

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      We would like to thank the reviewer for the deep reading and understanding of our paper and method, and the thoughtful feedback provided. We agree with this summary, and will respond in detail to all the concerns raised.

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      Thank you for this point. While we went to great effort to explain things clearly, our efforts to be concise likely resulted in some lack of clarity. To address this, we have created a series of analysis guides for a more general neuroscience audience, reflecting our experience working with researchers at the NIH and the broader community. These guides walk users through the code, its deployment in typical scenarios, and the interpretation of results.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson’s Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors’ metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors’ approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects.

      Our goal was to demonstrate that FLMM provides insight into why the opposing within- and between-session effects occur: the between-session and within-session changes appear to occur at different trial timepoints. Thus, while the AUC metrics applied in Jeong et al. (2022) are enough to show the presence of Simpson’s paradox, it is difficult to hypothesize why the opposing within-/between-session effects occur. An AUC analysis cannot determine at what trial timepoints (relative to licking) those opposing trends occur.

      The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point re: potential reward predictability that we had not considered. They have convinced us that acknowledging this alternative perspective will strengthen the paper, and we have added it into the Discussion. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals may sense the reward delivery. After discussing extensively with the authors of Jeong et al. (2022), it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that may have served as a cue. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this potential learned predictability could, at least partially, account for the increase in signal magnitude across sessions. As this paper is focused on analysis methods, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting this explanation in detail, for consideration in future experiments. We have substantially edited this discussion and, as per the reviewer’s suggestion, have qualified our interpretations to reflect the uncertainty in explaining the observed trends.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane. Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      Thank you for this point. We agree with you that, given the scope of the paper, we should avoid any extensive comparison between the models. To address your comment, we have now removed portions of the Discussion that compared RPE and ANCCR. Overall, we agree with the reviewer, and think that future experiments will be needed for conclusively testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our description of several conversations with the Jeong et al., 2022 authors could have gone deeper, we hope the reviewer can appreciate that inclusion of these conversations was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting our discussion. We do commend the authors of Jeong et al., 2022 for their willingness to discuss all these details. They could easily have avoided acknowledging any potential incompleteness of their theory by claiming that our results do not invalidate their predictions for a random reward, because the reward could potentially have been predicted (due to an inadvertent CS+ generated from the solenoid pressure). Instead, they emphasized that they thought their experiment did test a random reward, to the extent they could determine, and that our results suggest components of their theory that should be updated. We think that engagement with re-analyses of one’s data, even when findings are at odds with an initial theoretical framing, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening at least in part because of our method: by analyzing the signal at every trial timepoint, it provides a formal way to test for the presence of a neural signal indicative of reward delivery perception. Ultimately, this was what we set out to do: help researchers ask questions of their data that may have been harder to ask before. We believe that having a demonstration that we can indeed do this for a “live” scientific issue is the most appropriate way of demonstrating the usefulness of the method.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (∆F/F) with smoothing and baseline correction and this does not seem to have been considered in the argument. Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we have made considerable efforts in the Results and Discussion sections to caution that alternative hypotheses (e.g., photobleaching) cannot be definitively ruled out. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high ∆F/F magnitudes in both time-windows. We do wish to point out that the Jeong et al. (2022) authors were also concerned about photobleaching as a possible explanation. At their request, we analyzed data from additional experiments, collected from the same animals. In most cases, we did not observe signal patterns that seemed to indicate photobleaching. Given the additional scrutiny, we do not think that photobleaching is more likely to invalidate results in this particular set of experiments than it would be in any other photometry experiment. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included primarily as a way of acknowledging that it is possible that non-linearities in photobleaching could occur. Regardless, your point is well taken and we have qualified our description of these analyses to express that photobleaching cannot be ruled out.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors’ description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out! We removed the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      Our point was initially included to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of re-analyzing shared datasets is acknowledging both areas where new analyses support the original results, as well as those where they conflict with them. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we have made those changes. We have qualified the conclusions of our analysis to emphasize they are a demonstration of how FLMM can be used to answer a certain style of question with hypothesis testing (how signal dynamics change across sessions), as opposed to providing evidence for/against the backpropagation hypothesis.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we made changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. Given the length of the manuscript as it stands, we could only include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify including analyses from a third dataset, only to have to relegate them to an appendix. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with many groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method, and compares the results with those yielded by standard analysis of AUCs, is already published (Beas et al., 2024). Finally, in our analysis guide we describe additional analyses, not included in the manuscript, that replicate positive results. Hence there are numerous demonstrations of FLMM’s performance in less controversial settings. We take your point that our description of the data supporting one theory or the other should be qualified, and we have corrected that. Specifically for your suggestion of Amo et al. 2022, we have not had the opportunity to personally reanalyze their data, but we are already in contact with other groups who have conducted preliminary analyses of their data with FLMM. We are delighted to see this, in light of your comments and our decision to restrict the scope of our paper. We will help them and other groups working on this question to the extent we can.

      Recommendations for the Authors:

      Reviewer #2:

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you for the positive feedback!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      Thank you for this suggestion. As we described above in response to Reviewer #2’s Public Reviews, we have added in a demonstration of the scalability of the method. Since our initial manuscript submission, we have further increased the package’s speed (e.g., through further parallelization). We are releasing the updated version of our package on CRAN.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      This is a great point. Our updated manuscript Discussion includes the following:

      “The FLMM framework may also be applicable to techniques like electrophysiology and calcium imaging. For example, our package can fit functional generalized LMMs with a count distribution (e.g., Poisson). Additionally, our method can be extended to model time-varying covariates. This would enable one to estimate how the level of association between signals, simultaneously recorded from different brain regions, fluctuates across trial time-points. This would also enable modeling of trials that differ in length due to, for example, variable behavioral response times (e.g., latency-topress).”

      Reviewer #3:

      The authors should define ’function’ in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7.

      We include a description of the alternate tests in Appendix Section 5.2. We have updated the Methods Section (Section 4) to introduce the reader to how ‘functions’ are conceptualized and modeled in the functional data analysis literature. Specifically, we added the following text:

      “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”

      Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      We appreciate your pointing this out, as the distinction is nuanced. Our manuscript includes a description of how joint CIs enable one to interpret effects as statistically significant for time-intervals as opposed to individual timepoints. Unlike joint CIs, assessing significance with pointwise CIs suffers from multiple-comparisons problems. As a result of your suggestion, we have included a short discussion of this to our analysis guide (Part 1), entitled “Pointwise or Joint 95% Confidence Intervals.” The Methods section of our manuscript also includes the following:

      “The construction of joint CIs in the context of functional data analysis is an important research question; see Cui et al. (2021) and references therein. Each point at which the pointwise 95% CI does not contain 0 indicates that the coefficient is statistically significantly different from 0 at that point. Compared with pointwise CIs, joint CIs takes into account the autocorrelation of signal values across trial time-points (the functional domain). Therefore, instead of interpreting results at a specific timepoint, joint CIs enable joint interpretations at multiple locations along the functional domain. This aligns with interpreting covariate effects on the photometry signals across time-intervals (e.g., a cue period) as opposed to at a single trial time-point. Previous methodological work has provided functional mixed model implementations for either joint 95% CIs for simple random-effects models (Cui et al., 2021), or pointwise 95% CIs for nested models (Scheipl et al., 2016), but to our knowledge, do not provide explicit formulas or software for computing joint 95% CIs in the presence of general random-effects specifications.”

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a fantastic point and we have added the following into the Discussion:

      “...[S]ignal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects.”

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      Good point. We have made this change.

      Minor corrections:

      Panels are mislabeled in Figure 5.

      Thank you. We have corrected this.

      The Crowder (2009) reference is incorrect, being a review of the book with the book presumably being the correct citation.

      Good catch, thank you! Corrected.

      In Section 5 (first appendix), the authors could include the alternate spelling ’fibre photometry’ to capture any citations that use British English spelling.

      This is a great suggestion, but we did not have time to recreate these figures before re-submission.

      Section 7.4 is almost all quotation, though unevenly using the block quotation formatting. It is unclear why such a large quotation is included.

      Thank you for pointing this out. We have removed this Appendix section (formerly Section 7.4) as the relevant text was already included in the Methods section.

      References

      Sofia Beas, Isbah Khan, Claire Gao, Gabriel Loewinger, Emma Macdonald, Alison Bashford, Shakira Rodriguez-Gonzalez, Francisco Pereira, and Mario A Penzo. Dissociable encoding of motivated behavior by parallel thalamo-striatal projections. Current Biology, 34(7):1549–1560, 2024.

      Erjia Cui, Andrew Leroux, Ekaterina Smirnova, and Ciprian Crainiceanu. Fast univariate inference for longitudinal functional models. Journal of Computational and Graphical Statistics, 31:1–27, 07 2021. doi: 10.1080/10618600.2021.1950006.

      Huijeong Jeong, Annie Taylor, Joseph R Floeder, Martin Lohmann, Stefan Mihalas, Brenda Wu, Mingkang Zhou, Dennis A Burke, and Vijay Mohan K Namboodiri. Mesolimbic dopamine release conveys causal associations. Science, 378(6626):eabq6740, 2022. doi: 10.1126/science.abq6740. URL https://www. science.org/doi/abs/10.1126/science.abq6740.

      Rachel S Lee, Marcelo G Mattar, Nathan F Parker, Ilana B Witten, and Nathaniel D Daw. Reward prediction error does not explain movement selectivity in dms-projecting dopamine neurons. eLife, 8:e42992, apr 2019. ISSN 2050-084X. doi: 10.7554/eLife.42992. URL https://doi.org/10.7554/eLife.42992.

      Fabian Scheipl, Jan Gertheiss, and Sonja Greven. Generalized functional additive mixed models. Electronic Journal of Statistics, 10(1):1455 – 1492, 2016. doi: 10.1214/16-EJS1145. URL https://doi.org/10.1214/16-EJS1145.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      MINOR CORRECTIONS AND QUERIES 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also Added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      Response: We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment <br /> This valuable study is a companion to a paper introducing a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs). While the evidence that recurrent SNVs or CDNs are common in true cancer driver genes is solid, the evidence that many more undiscovered cancer driver mutations will have CDNs, and that this approach could identify these undiscovered driver genes with about 100,000 samples, is limited. 

      Same criticism as in the eLife assessment of eLife-RP-RA-2024-99340 (https://elifesciences.org/reviewed-preprints/99340). Hence, please refer to the responses to the companion paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study investigates Cancer Driving Nucleotides (CDNs) using the TCGA database, finding that these recurring point mutations could greatly enhance our understanding of cancer genomics and improve personalized treatment strategies. Despite identifying 50-150 CDNs per cancer type, the research reveals that a significant number remain undiscovered, limiting current therapeutic applications, and underscoring the need for further larger-scale research.

      Strengths:

      The study provides a detailed examination of cancer-driving mutations at the nucleotide level, offering a more precise understanding than traditional gene-level analyses. The authors found a significant number of CDNs remain undiscovered, with only 0-2 identified per patient out of an expected 5-8, indicating that many important mutations are still missing. The study indicated that identifying more CDNs could potentially significantly impact the development of personalized cancer therapies, improving patient outcomes.

      Weaknesses:

      The study is constrained by relatively small sample sizes for each cancer type, which reduces the statistical power and robustness of the findings. ICGC and other large-scale WGS datasets are publicly available but were not included in this study.

      Thanks. We indeed have used all public data, including GENIE (figure 7 of the companion paper), ICGC and other integrated resources such as COSMIC. The main study is based on TCGA because it is unbiased for estimating the probability of CDN occurrences. In many datasets, the numerators are given but the denominators are not (the number of patients with the mutation / the total number of patients surveyed). In GENIE, we observed that E(u) estimated upon given sequencing panels are much smaller than in TCGA, this might be due to the selective report of nonsynonymous mutations for synonymous mutations are generally considered irrelevant in tumorigenesis.

      To be able to identify rare driver mutations, more samples are needed to improve the statistical power, which is well-known in cancer research. The challenges in direct functional testing of CDNs due to the complexity of tumor evolution and unknown mutation combinations limit the practical applicability of the findings.

      We fully agree. We now add a few sentences, making clear that the theory allows us to see how much more can be gained by each stepwise increase in sample size. For example, when the sample size reaches 106, further increases will yield almost no gain in confidence of CDNs identified (see figures of eLife-RP-RA-2024-99340. As pointed out in our provisional responses, an important strength of this pair of studies is that the results are testable. The complexity is the combination of mutations required for tumorigenesis and the identification of such combinations is the main goal and strength of this pair of studies. We add a few sentences to this effect.

      While the importance of large sample sizes in identifying cancer drivers is well-recognized, the analytical framework presented in the companion paper (https://elifesciences.org/reviewed-preprints/99340) goes a step further by quantitatively elucidating the relationship between sample size and the resolution of CDN detection.

      The question is very general as it is about multigene interactions, or epistasis. The challenges are true in all aspects of evolutionary biology, for example, the genetics of reproductive isolation(Wu and Ting 2004). The issue of epistasis is difficult because most, if not all, of the underlying mutations have to be identified in order to carry out functional tests. While the full identification is rarely feasible, it is precisely the objective of the CDN project. When the sample size increases to 100,000 for a cancer type, all point mutations for that cancer type should be identifiable.

      The QC of the TCGA data was not very strict, i.e, "patients with more than 3000 coding region point mutations were filtered out as potential hypermutator phenotypes", it would be better to remove patients beyond +/- 3*S.D from the mean number of mutations for each cancer type. Given some point mutations with >3 hits in the TCGA dataset, they were just false positive mutation callings, particularly in the large repeat regions in the human genome.

      Thanks. The GDC data portal offers data calls from multiple pipelines, enabling us to select mutations detected by at least two pipelines. While including patients with hypermutator phenotypes could introduce potential noise, as shown in Eq. 10 of the main text, our method for defining the upper limit of i* is relative robust to the fluctuations in the E(u) of the corresponding cancer population. Since readers may often ask about this, we expand the Methods section somewhat to emphasize this point.

      The codes for the statistical calculation (i.e., calculation of Ai_e, et al) are not publicly available, which makes the findings hard to be replicated.

      We have now updated the section of “Data Availability” in both papers. The key scripts for generating the major results are available at: https://gitlab.com/ultramicroevo/cdn_v1.

      Reviewer #2 (Public Review):

      Summary:

      The study proposes that many cancer driver mutations are not yet identified but could be identified if they harbor recurrent SNVs. The paper leverages the analysis from Paper #1 that used quantitative analysis to demonstrate that SNVs or CDNs seen 3 or more times are more likely to occur due to selection (ie a driver mutation) than they are to occur by chance or random mutation.

      Strengths:

      Empirically, mutation frequency is an excellent marker of a driver gene because canonical driver mutations typically have recurrent SNVs. Using the TCGA database, the paper illustrates that CDNs can identify canonical driver mutations (Figure 3) and that most CDNs are likely to disrupt protein function (Figure 2). In addition, CDNs can be shared between cancer types (Figure 4).

      Weaknesses:

      Driver alteration validation is difficult, with disagreements on what defines a driver mutation, and how many driver mutations are present in a cancer. The value proposed by the authors is that the identification of all driver genes can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes. There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene). Other alterations (epigenetic, indels, translocations, CNVs) would be missed by this type of analysis.

      The above paragraph has three distinct points. We shall respond one by one.

      First, …  can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes…

      We state in the text of Discussion the following that shows only a few best-known driving mutations have been targeted. It is accurate to say that < 5% of CDNs we have identified are on the current targeting list. Furthermore, this list we have compiled is < 10% of what we expect to find.

      Direct functional test of CDNs would be to introduce putative cancer-driving mutations and observe the evolution of tumors. Such a task of introducing multiple mutations that are collectively needed to drive tumorigenesis has been done only recently, and only for the best-known cancer driving mutations (Ortmann et al. 2015; Takeda et al. 2015; Hodis et al. 2022). In most tumors, the correct combination of mutations needed is not known. Clearly, CDNs, with their strong tumorigenic strength, are suitable candidates.

      Second, “There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene).”

      We sincerely thank the reviewer for this insightful comment. Below are two new paragraphs in the Discussion pertaining to the point:

      In this context, we should comment on the feasibility of targeting CDNs that may occur in either oncogenes (ONCs) or tumor suppressor genes (TSGs). It is generally accepted that ONCs drive tumorigenesis thanks to the gain-of-function (GOF) mutations whereas TSGs derive their tumorigenic powers by loss-of-function (LOF) mutations. It is worthwhile to point out that, since LOF mutations are likely to be more widespread on a gene, CDNs are biased toward GOF mutations. The often even distribution of non-sense mutations along the length of TSGs provide such evidence. As gene targeting aims to diminish gene functions, GOF mutations are perceived to be targetable whereas LOF mutations are not. By extension, ONCs should be targetable but TSGs are not. This last assertion is not true because mutations on TSGs may often be of the GOF kind as well.

      The data often suggest that mis-sense mutations on TSGs are of the GOF kind. If mis-sense mutations are far more prevalent than nonsense mutations in tumors, the mis-sense mutations cannot possibly be LOF mutations. (After all, it is not possible to lose more functions than nonsense mutations.) For example, AAA to AAC (K to Q) is a mis-sense mutation while AAA to AAT (K to stop) is a non-sense mutation. In a separate study (referred to as the escape-route analysis), we found many cases where the mis-sense mutations on TSGs are more prevalent (> 10X) than nonsense mutations. Another well-known example is the distribution of non-sense mutations TSGs. For example, on APC, a prominent TSG, non-sense mutations are far more common in the middle 20% of the gene than the rest (Zhang and Shay 2017; Erazo-Oliveras et al. 2023). The pattern suggests that even these non-sense mutations could have GOF properties. 

      The following response is about the clinical implications of our CDN analysis. Canonical targeted therapy often relies on the Tyrosine Kinase Inhibitors (TKIs) (Dang et al. 2017; Danesi et al. 2021; Waarts et al. 2022). Theoretically, any intervention that suppresses the expression of gain-of-function (GOF) CDNs could potentially have therapeutic value in cancer treatment. This leads us to a discussion of oncogenes versus TSGs in the context of GOF / LOF (loss of function) mutations. Not all mutations on oncogenes have oncogenic effect, besides, truncated mutations in oncogenes are often subject to negative selection (Bányai et al. 2021), the identification of CDNs within oncogenes is therefore crucial for developing effective cancer treatment guidelines. Secondly, while TSGs are generally believed to promote cancer development via loss of function mutations, research suggests that certain mutations within TSGs can have GOF-like effect, such as the dominant negative effect of truncated TP53 mutations (Marutani et al. 1999; de Vries et al. 2002; Gerasimavicius et al. 2022). Characterizing driver mutations as GOF or LOF mutations could potentially expand the scope of targeted cancer therapy. We’ll address this issue in a third study in preparation.

      The method could be more valuable when applied to the noncoding genome, where driver mutations in promoters or enhancers are relatively rare, or as yet to be discovered. Increasingly more cancers have had whole genome sequencing. Compared to WES, criteria for driver mutations in noncoding regions are less clear, and this method could potentially provide new noncoding driver CDNs. Observing the same mutation in more than one cancer specimen is empirically unusual, and the authors provide a solid quantitative analysis that indicates many recurrent mutations are likely to be cancer-driver mutations.

      Again, we are grateful for the comments which prompt us to expand a paragraph in Discussion, reproduced below.

      The CDN approach has two additional applications. First, it can be used to find CDNs in non-coding regions. Although the number of whole genome sequences at present is still insufficient for systematic CDN detection, the preliminary analysis suggests that the density of CDNs in non-coding regions is orders of magnitude lower than in coding regions. Second, CDNs can also be used in cancer screening with the advantage of efficiency as the targeted mutations are fewer. For the same reason, the false negative rate should be much lower too. Indeed, the false positive rate should be far lower than the gene-based screen which often shows a false positive rate of >50% (supplement File S1).

      Again, we are grateful that Reviewer #2 have addressed the potential value of our study in finding cancer drivers in non-coding regions. A major challenge in this area lies in defining the appropriate L value as presented in Eq. 10. In the main text, we used a gamma distribution to account for the variability of mutation rates across sites in coding region. For the non-coding region, we will categorize these regions based on biological annotations. The goal is to set different i* cutoffs for different genomic regions (such as heterochromatin / euchromatin, GC-rich regions or centromeric regions), and avoid false positive calls for CDN in repeated regions (Elliott and Larsson 2021; Peña et al. 2023).

      References

      Bányai L, Trexler M, Kerekes K, Csuka O, Patthy L. 2021. Use of signals of positive and negative selection to distinguish cancer genes and passenger genes. Elife 10:e59629.

      Danesi R, Fogli S, Indraccolo S, Del Re M, Dei Tos AP, Leoncini L, Antonuzzo L, Bonanno L, Guarneri V, Pierini A, et al. 2021. Druggable targets meet oncogenic drivers: opportunities and limitations of target-based classification of tumors and the role of Molecular Tumor Boards. ESMO Open 6:100040.

      Dang CV, Reddy EP, Shokat KM, Soucek L. 2017. Drugging the “undruggable” cancer targets. Nat Rev Cancer 17:502–508.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Erazo-Oliveras A, Muñoz-Vega M, Mlih M, Thiriveedi V, Salinas ML, Rivera-Rodríguez JM, Kim E, Wright RC, Wang X, Landrock KK, et al. 2023. Mutant APC reshapes Wnt signaling plasma membrane nanodomains by altering cholesterol levels via oncogenic β-catenin. Nat Commun 14:4342.

      Gerasimavicius L, Livesey BJ, Marsh JA. 2022. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 13:3895.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Marutani M, Tonoki H, Tada M, Takahashi M, Kashiwazaki H, Hida Y, Hamada J, Asaka M, Moriuchi T. 1999. Dominant-negative mutations of the tumor suppressor p53 relating to early onset of glioblastoma multiforme. Cancer Res 59:4765–4769.

      Ortmann CA, Kent DG, Nangalia J, Silber Y, Wedge DC, Grinfeld J, Baxter EJ, Massie CE, Papaemmanuil E, Menon S, et al. 2015. Effect of Mutation Order on Myeloproliferative Neoplasms. N Engl J Med 372:601–612.

      Peña MV de la, Summanen PAM, Liukkonen M, Kronholm I. 2023. Chromatin structure influences rate and spectrum of spontaneous mutations in Neurospora crassa. Genome Res. 33:599–611.

      Takeda H, Wei Z, Koso H, Rust AG, Yew CCK, Mann MB, Ward JM, Adams DJ, Copeland NG, Jenkins NA. 2015. Transposon mutagenesis identifies genes and evolutionary forces driving gastrointestinal tract tumor progression. Nat Genet 47:142–150.

      de Vries A, Flores ER, Miranda B, Hsieh H-M, van Oostrom CThM, Sage J, Jacks T. 2002. Targeted point mutations of p53 lead to dominant-negative inhibition of wild-type p53 function. Proceedings of the National Academy of Sciences 99:2948–2953.

      Waarts MR, Stonestrom AJ, Park YC, Levine RL. 2022. Targeting mutations in cancer. J Clin Invest 132:e154943.

      Wu C-I, Ting C-T. 2004. Genes and speciation. Nat Rev Genet 5:114–122.

      Zhang L, Shay JW. 2017. Multiple Roles of APC and its Therapeutic Implications in Colorectal Cancer. JNCI: Journal of the National Cancer Institute 109:djw332.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment This valuable paper reports a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs), primarily based on single nucleotide variant (SNV) frequencies. A variety of solid approaches indicate that a mutation recurring three or more times is more likely to reflect selection rather than being the consequence of a mutation hotspot. The method is rigorously quantitative, though the requirement for larger datasets to fully identify all CDNs remains a noted limitation. The work will be of broad interest to cancer geneticists and evolutionary biologists. 

      The key criticism “the requirement for larger datasets to fully identify all CDNs remains a noted limitation” that is also found in both reviews. We have clarified the issue in the main text, the relevant parts, from which are copied below. The response below also addresses many comments in the reviews. In addition, Discussion of eLife-RP-RA-2024-99341 has been substantially expanded to answer the questions of Reviewer 2.

      We shall answer the boldface comment in three ways. First, it can be answered using GENIE data. Fig. 7 of the main text (eLife-RP-RA-2024-99340) shows that, when n increases from ~ 1000 to ~ 9,000, the numbers of discovered CDNs increase by 3 – 5 fold, most of which come from the two-hit class. Hence, the power of discovering more CDNs with larger datasets is evident. By extrapolation, a sample size of 100,000 should be able to yield 90% of all CDNs, as calculated here. (Fig. 7 also addresses the queries of whether we have used datasets other than TCGA. We indeed have used all public data, including GENIE and COSMIC.) 

      Second, the power of discovering more cancer driver genes by our theory is evident even without using larger datasets. Table 3 of the companion study (eLife-RP-RA-2024-99341) shows that, averaged across cancer types, the conventional method would identify 45 CDGs while the CDN method tallies 258 CDGs. The power of the CDN method is demonstrated. This is because the conventional approach has to identify CDGs (cancer driver genes) in order to identify the CDNs they carry. However, many CDNs occur in non-CDGs and are thus missed by the conventional approach. In Supplementary File S2, we have included a full list of CDNs discovered in our study, along with population allele frequency annotations from gnomAD. The distribution patterns of these CDNs across different cancer types show their pan-cancer properties as further explored in the companion paper.

      Third, while many, or even most CDNs occur in non-CDGs and are thus missed, the conventional approach also includes non-CDN mutations in CDGs. This is illustrated in Fig. 5 of the companion study (eLife-RP-RA-2024-99341) that shows the adverse effect of misidentifications of CDNs by the conventional approach. In that analysis, the gene-targeting therapy is effective if the patient has the CDN mutations on EGFR, but the effect is reversed if the EGFR mutations are non-CDN mutations.

      Reviewer #1 (Public Review):

      The authors developed a rigorous methodology for identifying all Cancer Driving Nucleotides (CDNs) by leveraging the concept of massively repeated evolution in cancer. By focusing on mutations that recur frequently in pan-cancer, they aimed to differentiate between true driver mutations and neutral mutations, ultimately enhancing the understanding of the mutational landscape that drives tumorigenesis. Their goal was to call a comprehensive catalogue of CDNs to inform more effective targeted therapies and address issues such as drug resistance.

      Strengths

      (1) The authors introduced a concept of using massively repeated evolution to identify CDNs. This approach recognizes that advantageous mutations recur frequently (at least 3 times) across cancer patients, providing a lens to identify true cancer drivers.

      (2) The theory showed the feasibility of identifying almost all CDNs if the number of sequenced patients increases to 100,000 for each cancer type.

      Weaknesses

      (1) The methodology remains theoretical and no novel true driver mutations were identified in this study.

      We now address the weakness criticism, which is gratefully received.

      The second part of the criticism (no novel true driver mutations were identified in this study) has been answered in the long responses to eLife assessment above. The first part “The methodology remains theoretical” is somewhat unclear. It might be the lead to the second part. However, just in case, we interpret the word “theoretical” to mean “the lack of experimental proof” and answer below.

      As Reviewer #1 noted, a common limitation of theoretical and statistical analyses of cancer drivers is the need to validate their selective advantage through in vitro or in vivo functional testing. This concern is echoed by both reviewers in the companion paper (eLife-RP-RA-2024-99341), prompting us to consider the methodology for functional testing of potential cancer drivers. An intuitive approach would involve introducing putative driver mutations into normal cells and observing phenotypic transformation in vitro and in vivo. In a recent stepwise-edited human melanoma model, Hodis et al. demonstrated that disease-relevant phenotypes depend on the “correct” combinations of multiple driver mutations (Hodis et al. 2022). Other high-throughput strategies can be broadly categorized into two approaches: (1) introducing candidate driver mutations into pre-malignant model systems that already harbor a canonical mutant driver (Drost and Clevers 2018; Grzeskowiak et al. 2018; Michels et al. 2020) and (2) introducing candidate driver mutations into growth factor-dependent cell models and assessing their impact on resulting fitness (Bailey et al. 2018; Ng et al. 2018). The underlying assumption of these strategies is that the fitness outcomes of candidate driver mutations are influenced by pre-existing driver mutations and the specific pathways or cancer hallmarks being investigated. This confines the functional test of potential cancer driver mutations to conventional cancer pathways. A comprehensive identification of CDNs is therefore crucial to overcome these limitations. In conjunction with other driver signal detection methods, our study aims to provide a more comprehensive profile of driver mutations, thereby enabling the functional testing of drivers involved in non-conventional cancer evolution pathways.

      (2) Different cancer types have unique mutational landscapes. The methodology, while robust, might face challenges in uniformly identifying CDNs across various cancers with distinct genetic and epigenetic contexts.

      We appreciate the comment. Indeed, different cancer types should have different genetic and epigenetic landscapes. In that case, one may have expected CDNs to be poorly shared among cancer types. However, as reported in Fig. 4 of the companion study, the sharing of CDNs across cancer types is far more common than the sharing of CDGs (Cancer Driving Genes). We suggest that CDNs have a much higher resolution than CDGs, whereby the signals are diluted by non-driver mutations. In other words, despite that the mutational landscape may be cancer-type specific, the pan-cancer selective pressure may be sufficiently high to permit the detection of CDN sharing among cancer types.

      Below, we shall respond in greater details. Epigenetic factors, such as chromatin states, methylation/acetylation levels, and replication timing, can provide valuable insights when analyzing mutational landscapes at a regional scale (Stamatoyannopoulos et al. 2009; Lawrence et al. 2013; Makova and Hardison 2015; Baylin and Jones 2016; Alexandrov et al. 2020; Abascal et al. 2021; Sherman et al. 2022). However, at the site-specific level, the effectiveness of these covariates in predicting mutational landscapes depends on their integration into a detailed model. Overemphasizing these covariates could lead to false negatives for known driver mutations (Hess et al. 2019; Elliott and Larsson 2021). In figure 3B of the main text, we illustrate the discrepancy between the mutation rate predictions from Dig and empirical observation. Ideally, no covariates would be needed under extensive sample sizes, where each mutable genomic sites would have sufficient mutations to yield a statistic significance and consequently, synonymous mutations would be sufficient for the characterization of mutational landscape. In this sense, the integration of mutational covariates represents a compromise under current sample size. In our study, the effect of unique mutational landscapes is captured by E(u), the mean mutation rate for each cancer type. We further accounted for the variability of site-level mutability using a gamma distribution. The primary goal of our study is to determine the upper limit of mutation recurrences under mutational mechanisms only. While selection force acts blindly to genomic features, mutational hotspots should exhibit common characteristics determined by their underlying mechanisms. In the main text, we attempted to identify such shared features among CDNs. Until these mutational mechanisms are fully understood, CDNs should be considered as potential driver mutations.

      (3) L223, the statement "In other words, the sequences surrounding the high-recurrence sites appear rather random.". Since it was a pan-cancer analysis, the unique patterns of each cancer type could be strongly diluted in the pan-cancer data.

      We now state that the analyses of mutation characteristic have been applied to the individual cancer types and did not find any pattern that deviates from randomness. Nevertheless, it may be argued that, with the exception of those with sufficiently large sample sizes such as lung and breast cancers, most datasets do not have the power to reject the null hypothesis. To alleviate this concern, we applied the ResNet and LSTM/GRU methods for the discovery of potential mutation motifs within each cancer type. All methods are more powerful than the one used but the results are the same – no cancer type yields a mutation pattern that can reject the null hypothesis of randomness (see below).

      As a positive control, we used these methods for the discovery of splicing sites of human exons. When aligned up with splicing site situated in the center (position 51 in the following plot), the sequence motif would look like:

      Author response image 1.

      5-prime

      Author response image 2.

      3-prime

      However, To account for the potential influence of distance from the mutant site in motif analysis, we randomly shuffled the splicing sites within a specified window around the alignment center, and their sequence logo now looks like:

      Author response image 3.

      5-prime shuffled

      Author response image 4.

      3-prime shuffled

      Author response image 5.

      random sequences from coding regions

      The classification results of the shuffled 5-prime (donner), 3-prime (acceptor) and random sequences from coding regions (Random CDS) are presented in the Author response table 1 (The accuracy for the aligned results, which is approximately 99%, is not shown here).

      Author response table 1.

      With the positive results from these positive controls (splicing site motifs) validating our methodology, we applied the same model structure to the train and test of potential mutational motifs of CDN sites. All models achieved approximately 50% accuracy in CDN motif analysis, suggesting that the sequence contexts surrounding CDN sites are not significantly different from other coding regions of the genome. This further implies that the recurrence of mutations at CDN sites is more likely driven by selection rather than mutational mechanisms.

      Note that this preliminary analysis may be limited by insufficient training data for CDN sites. Future studies will require larger sample sizes and more sophisticated models to address these limitations.

      (4) To solidify the findings, the results need to be replicated in an independent dataset.

      Figure 7 validates our CDN findings using the GENIE dataset, which primarily consists of targeted sequencing data from various panels. By focusing on the same genomic regions sequenced by GENIE, we observed a 3-5 fold increase in the number of discovered CDNs as sample size increased from approximately 1000 to 9000. Moreover, the majority of CDNs identified in TCGA were confirmed as CDNs in GENIE.

      (5) The key scripts and the list of key results (i.e., CDN sites with i{greater than or equal to}3) need to be shared to enable replication, validation, and further research. So far, only CDN sites with i{greater than or equal to}20 have been shared.

      We have now updated the “Data Availability” section in the main text, the corresponding scripts for key results are available on Gitlab at: https://gitlab.com/ultramicroevo/cdn_v1.

      (6) The versions of data used in this study are not clearly detailed, such as the specific version of gnomAD and the version and date of TCGA data downloaded from the GDC Data Portal.

      The versions of data sources have now been updated in the revised manuscript.

      Recommendations For The Authors:

      (1) L119, states "22.7 million nonsynonymous sites," but Table 1 lists the number as 22,540,623 (22.5 million). This discrepancy needs to be addressed for consistency.<br /> (2) Figure 2B, there is an unexplained drop in the line at i = 6 and 7 (from 83 to 45). Clarification is needed on why this drop occurs.<br /> (3) Figure 3A, for the CNS type, data for recurrence at 8 and 9 are missing. An explanation should be provided for this absence.<br /> (4) L201, the title refers to "100-mers," but L218 mentions "101-mers." This inconsistency needs to be corrected to ensure clarity and accuracy.<br /> (5) Figures 6 and 7 currently lack titles. Titles should be added to these figures to improve readability.

      Thanks. All corrections have been incorporated into the revised manuscript.

      Reviewer #2 (Public Review):<br /> Summary:<br /> The authors propose that cancer-driver mutations can be identified by Cancer Driving Nucleotides (CDNs). CDNs are defined as SNVs that occur frequently in genes. There are many ways to define cancer driver mutations, and the strengths and weaknesses are the reliance on statistics to define them.<br /> Strengths:<br /> There are many well-known approaches and studies that have already identified many canonical driver mutations. A potential strength is that mutation frequencies may be able to identify as yet unrecognized driver mutations. They use a previously developed method to estimate mutation hotspots across the genome (Dig, Sherman et al 2022). This publication has already used cancer sequence data to infer driver mutations based on higher-than-expected mutation frequencies. The advance here is to further illustrate that recurrent mutations (estimated at 3 or more mutations (CDNs) at the same base) are more likely to be the result of selection for a driver mutation (Figure 3). Further analysis indicates that mutation sequence context (Figure 4) or mutation mechanisms (Figure 5) are unlikely to be major causes for recurrent point mutations. Finally, they calculate (Figure 6) that most driver mutations identifiable by the CDN approach could be identified with about 100,000 to one million tumor coding genomes.<br /> Weaknesses:<br /> The manuscript does provide specific examples where recurrent mutations identify known driver mutations but do not identify "new" candidate driver mutations. Driver mutation validation is difficult and at least clinically, frequency (ie observed in multiple other cancer samples) is indeed commonly used to judge if an SNV has driver potential. The method would miss alternative ways to trigger driver alterations (translocations, indels, epigenetic, CNVs). Nevertheless, the value of the manuscript is its quantitative analysis of why mutation frequencies can identify cancer driver mutations.

      Recommendations For The Authors<br /> Whereas the analysis of driver mutations in WES has been extensive, the application of the method to WGS data (ie the noncoding regions) would provide new information.

      We appreciate that Reviewer #2 has suggested the potential application of our method to noncoding regions. Currently, the background mutation model is based on the site level mutations in coding regions, which hinders its direct applications in other mutation types such as CNVs, translocations and indels. We acknowledge that the proportion of patients with driver event involving CNV (73%) is comparable to that of coding point mutations (76%) as reported in the PCAWG analysis (Fig. 2A from Campbell et al., 2020). In future studies, we will attempt to establish a CNV-based background mutation rate model to identify positive selection signals driving tumorigenesis.

      References

      Abascal F, Harvey LMR, Mitchell E, Lawson ARJ, Lensing SV, Ellis P, Russell AJC, Alcantara RE, Baez-Ortega A, Wang Y, et al. 2021. Somatic mutation landscapes at single-molecule resolution. Nature:1–6.

      Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. 2020. The repertoire of mutational signatures in human cancer. Nature 578:94–101.

      Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. 2018. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18.

      Baylin SB, Jones PA. 2016. Epigenetic Determinants of Cancer. Cold Spring Harb Perspect Biol 8:a019505.

      Campbell PJ, Getz G, Korbel JO, Stuart JM, Jennings JL, Stein LD, Perry MD, Nahal-Bose HK, Ouellette BFF, Li CH, et al. 2020. Pan-cancer analysis of whole genomes. Nature 578:82–93.

      Drost J, Clevers H. 2018. Organoids in cancer research. Nat Rev Cancer 18:407–418.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Grzeskowiak CL, Kundu ST, Mo X, Ivanov AA, Zagorodna O, Lu H, Chapple RH, Tsang YH, Moreno D, Mosqueda M, et al. 2018. In vivo screening identifies GATAD2B as a metastasis driver in KRAS-driven lung cancer. Nat Commun 9:2732.

      Hess JM, Bernards A, Kim J, Miller M, Taylor-Weiner A, Haradhvala NJ, Lawrence MS, Getz G. 2019. Passenger Hotspot Mutations in Cancer. Cancer Cell 36:288-301.e14.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, et al. 2013. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218.

      Makova KD, Hardison RC. 2015. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 16:213–223.

      Michels BE, Mosa MH, Streibl BI, Zhan T, Menche C, Abou-El-Ardat K, Darvishi T, Członka E, Wagner S, Winter J, et al. 2020. Pooled In Vitro and In Vivo CRISPR-Cas9 Screening Identifies Tumor Suppressors in Human Colon Organoids. Cell Stem Cell 26:782-792.e7.

      Ng PK-S, Li J, Jeong KJ, Shao S, Chen H, Tsang YH, Sengupta S, Wang Z, Bhavana VH, Tran R, et al. 2018. Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10.

      Sherman MA, Yaari AU, Priebe O, Dietlein F, Loh P-R, Berger B. 2022. Genome-wide mapping of somatic mutation rates uncovers drivers of cancer. Nat Biotechnol 40:1634–1643.

      Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. 2009. Human mutation rate associated with DNA replication timing. Nat Genet 41:393–395.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The authors proposed a framework to estimate the posterior distribution of parameters in biophysical models. The framework has two modules: the first MLP module is used to reduce data dimensionality and the second NPE module is used to approximate the desired posterior distribution. The results show that the MLP module can capture additional information compared to manually defined summary statistics. By using the NPE module, the repetitive evaluation of the forward model is avoided, thus making the framework computationally efficient. The results show the framework has promise in identifying degeneracy. This is an interesting work.

      We thank the reviewer for the positive comments made on our manuscript. 

      Reviewer #1 (Recommendations For The Authors): 

      I have some minor comments. 

      (1) The uGUIDE framework has two modules, MLP and NPE. Why are the two modules trained jointly? The MLP module is used to reduce data dimensionality. Given that the number of features for different models is all fixed to 6, why does one need different MLPs? This module should, in principle, be general-purpose and independent of the model used.

      The MLP must be trained together with the NPE module to maximise inference performance in terms of accuracy and precision. Although the number of features predicted by the MLP was fixed to six, the characteristics of these six features can be very different, depending on the chosen forward model and the available data, as we showed in Appendix 1 Figure 1. Training the MLP independently of the NPE would result in suboptimal performance of µGUIDE, with potentially higher bias and variance of the predicted posterior distributions. We have now added these considerations in the Methods section.

      (2) The authors mentioned at L463 that all the 3 models use 6 features. From L445 to L447, it seems model 3 has 7 unknown parameters. How can one use 6 features to estimate 7 unknowns? 

      Thank you for pointing out the lack of clarity regarding the parameters to estimate in this section. Model 3 is a three-compartment model, whose parameters of interest are the signal fraction and diffusivity from water diffusing in the neurite space (fn and Dn), the neurites orientation dispersion index (ODI), the signal fraction in cell bodies (fs), a proxy to soma radius and diffusivity (Cs), and the signal fraction and diffusivity in the extracellular space (fe and De). The signal fractions are constrained by the relationship fn + fs + fe = 1, hence fe  i_s calculated from the estimated _fn and fs. This leaves us with 6 parameters to estimate: fn, Dn, ODI, fs, Cs, De. We clarified it in the revised version of the paper. 

      (3) L471, Rician noise is not a proper term. Rician distribution is the distribution of pixel intensities observed in the presence of noise. And Rician distribution is the result of magnitude reconstruction. See "Noise in magnitude magnetic resonance images" published in 2008. I assume that real-valued Gaussian noise is added to simulated data. 

      We apologize for the confusion. We added Gaussian noise to the real and imaginary parts of the simulated signals and then used the magnitude of this noisy complex signal for our experiments. We rephrased the sentence for more clarity.

      (4) L475, why thinning is not used in MCMC? In figure 3, the MCMC results are more biased than uGUIDE, is it related to no thinning in MCMC? 

      We followed the recommendations by Harms et al. (2018) for the MCMC experiments. They analysed the impact of thinning (among other parameters) on the estimated posterior distributions. Their findings indicate that thinning is unnecessary and inefficient, and they recommend using more samples instead. For further details, we refer the reviewer to their publication, along with the theoretical works they cite. We have now added this note in the Methods section.

      (5) Did the authors try model-fitting methods with different initializations to get a distribution of the parameters? Like the paper "Degeneracy in model parameter estimation for multi‐compartmental diffusion in neuronal tissue". For the in vivo data, it is informative to see the model-fitting results.

      No, we did not try model-fitting methods with different initializations because such methods provide only a partial description of the solution landscape, which can be interpreted as a partial posterior distribution. Although this approach can help to highlight the problem of degeneracy, it does not provide a complete description of all potential solutions. In contrast, MCMC estimates the full posterior distribution, offering a more accurate and precise characterization of degeneracies and uncertainties compared to model-fitting methods with varying initializations. Hence, we decided to use MCMC as benchmark. We have now added these considerations to the Discussion section. 

      Reviewer #2 (Public Review): 

      Summary: 

      The authors improve the work of Jallais et al. (2022) by including a novel module capable of automatically learning feature selection from different acquisition protocols inside a supervised learning framework. Combining the module above with an estimation framework for estimating the posterior distribution of model parameters, they obtain rich probabilistic information (uncertainty and degeneracy) on the parameters in a reasonable computation time. 

      The main contributions of the work are: 

      (1) The whole framework allows the user to avoid manually defining summary statistics, which may be slow and tedious and affect the quality of the results. 

      (2) The authors tested the proposal by tackling three different biophysical models for brain tissue and using data with characteristics commonly used by the diffusion-MRmicrostructure research community. 

      (3) The authors validated their method well with the state-of-the-art. 

      The main weakness is: 

      (1) The methodology was tested only on scenarios with a signal-to-noise ratio (SNR) equal to 50. It is interesting to show results with lower SNR and without noise that the method can detect the model's inherent degenerations and how the degeneration increases when strong noise is present. I suggest expanding the Figure in Appendix 1 to include this information. 

      The authors showed the utility of their proposal by computing complex parameter descriptors automatically in an achievable time for three different and relevant biophysical models. 

      Importantly, this proposal promotes tackling, analysing, and considering the degenerated nature of the most used models in brain microstructure estimation. 

      We thank the reviewer for these positive remarks. 

      Concerning the main weakness highlighted by the reviewer: In our submitted work, we presented results both without noise and with a signal-to-noise ratio (SNR) equal to 50 (similar to the SNR in the experimental data analysed). Figure 5 shows exemplar posterior distributions obtained in a noise-free scenario, and Table 1 reports the number of degeneracies for each model on 10000 noise-free simulations. These results highlight that the presence of degeneracies is inherent to the model definition. Figures 3, 6 and 7 present results considering an SNR of 50. We acknowledge that results with lower SNR have not been included in the initial submission. To address this, we added a figure in the appendix illustrating the impact of noise on the posterior distributions. Specifically, Figure 1A of Appendix 2 shows posterior distributions estimated from signals generated using an exemplar set of model parameters with varying noise levels

      (no noise, SNR=50 and SNR=25). Figure 1B presents uncertainties values obtained on 1000 simulations for each noise level. We observe that, as the SNR reduces, uncertainty increases. Noise in the signal contributes to irreducible variance. The confidence in the estimates therefore reduces as the noise level increases.  

      Reviewer #2 (Recommendations For The Authors):  

      Some suggestions: 

      Panel A of Figure 2 may deserve a better explanation in the Figure's caption. 

      We agree that the description of panel A of figure 2 was succinct and added more explanation in the figure’s caption.  

      The caption of Figure 3 should mention that the panel's titles are the parameters of the used biophysical models. 

      We added in the caption of figure 3 that the names of the model parameters are indicated in the titles of the panels. We apologise for the confusion it may have created.

      In equation (3), the authors should indicate the summation index. 

      We apologise for not putting the summation index in equation 3. We added it in the revised version.

      In line 474, the authors should discuss if the systematic use of the maximum likelihood estimator as an initializer for the sampling does not bias the computed results. 

      Concerning the MCMC estimations, we followed the recommendations from Harms et al. (2018). They investigated the use of starting from the maximum likelihood estimator (MLE). They concluded that starting from the MLE allows to start in the stationary distribution of the Markov chain, removing the need for some burn-in. Additionally, they showed that initializing the sampling from the MLE has the advantage of removing salt- and pepper-like noise from the resulting mean and standard deviation maps. We have now added this note in the Methods section.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we will change the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We will make the point clearer in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the manuscript so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, it does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1G-J, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics.

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript.

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we will add a statement discussing the potential contribution of receptors beyond D1/D5.

    1. Author response:

      We thank the editor and reviewers for their feedback. We believe we can address the substantive criticisms in full, first, by providing a more explicit theoretical basis for the method. Then, we believe criticism based on assumptions about phase consistency across time points are not well founded and can be answered. Finally, in response to some reviewer comments, we will improve the surrogate testing of the method.

      We will enhance the theoretical justification for the application of higher-order singular value decomposition (SVD) to the problem of irregular sampling of the cortical area. The initial version of the manuscript was written to allow informal access to these ideas (if possible), but the reviewers find a more rigorous account appropriate. We will add an introduction to modern developments in the use of functional SVD in geophysics, meteorology & oceanography (e.g., empirical orthogonal functions) and quantitative fluid dynamics (e.g., dynamic mode decomposition) and computational chemistry. Recently SVD has been used in neuroscience studies (e.g., cortical eigenmodes). To our knowledge, our work is the first time higher-order SVD has been applied to a neuroscience problem. We use it here to solve an otherwise (apparently) intractable problem, i.e., how to estimate the spatial frequency (SF) spectrum on a sparse and highly irregular array with broadband signals.

      We will clarify the methodological strategy in more formal terms in the next version of the paper. But essentially SVD allows a change of basis that greatly simplifies quantitative analysis. Here it allows escape from estimating the SF across millions of data-points (triplets of contacts, at each sample), each of which contains multiple overlapping signals plus noise (noise here defined in the context of SF estimation) and are inter-correlated across a variety of known and unknown observational dimensions. Rather than simply average over samples, which would wash out much of the real signal, SVD allows the signals to be decomposed in a lossless manner (up to the choice of number of eigenvectors at which the SVD is truncated). The higher-order SVD we have implemented reduces the size of problem to allow quantification of SF over hundreds of components, each of which is guaranteed certain desirable properties, i.e., they explain known (and largest) amounts of variance of the original data and are orthonormal. This last property allows us to proceed as if the observations are independent. SF estimates are made within this new coordinate system.

      We will also more concretely formalise the relation between Fourier analysis and previous observations of eigenvectors of phase that are smooth gradients.

      We will very briefly review Fourier methods designed to deal with non-uniform sampling. The problems these methods are designed for fall into the non-uniform part of the spectrum from uniform–non-uniform–irregular–highly-irregular–noise. They are highly suited to, for example, interpolating between EEG electrodes to produce a uniform array for application of the fast Fourier transform (Alamia et al., 2023). However, survey across a range of applied maths fields suggests that no method exists for the degree of irregular sampling found in the sEEG arrays at issue here. In particular, the sparseness of the contact coverage presents an insurmountable hurdle to standard methods. While there exists methods for sparse samples (e.g., Margrave & Fergusen, 1999; Ying 2009), these require well-defined oscillatory behavior, e.g., for seismographic analysis. Given the problems of highly irregular sampling, sparseness of sampling and broadband, nonstationary signals, we have attempted a solution via the novel methods introduced in the current manuscript. We were able to leverage previous observations regarding the relation between eigenvectors of cortical phase and Fourier analysis, as we outline in the manuscript.

      We will extend the current 1-dimensional surrogate data to better demonstrate that the method does indeed correctly detect the ordinal relations in power on different parts of the SF spectrum. We will include the effects of a global reference signal. Simulations of cortical activity are an expensive way to achieve this goal. While the first author has published in this area, such simulations are partly a function of the assumptions put into them (i.e., spatial damping, boundary conditions, parameterization of connection fields). We will therefore use surrogate signals derived from real cortical activity to complete this task.

      Some more specific issues raised:<br /> (1) Application of the method to general neuroscience problems:<br /> The purpose of the manuscript was to estimate the SF spectrum of phase in the cortex, in the range where it was previously not possible. The purpose was not specifically to introduce a new method of analysis that might be immediately applicable to a wide range of available data-sets. Indeed, the specifics of the method are designed to overcome an otherwise intractable disadvantage of sEEG (irregular spatial sampling) in order to take advantage of its good coverage (compared to ECoG) and low volume conduction compared to extra-cranial methods. On the other hand, the developing field of functional SVD would be of interest to neuroscientists, as a set of methods to solve difficult problems, and therefore of general interest. We will make these points explicit in the next version of the manuscript. In order to make the method more accessible, we will also publish code for the key routines (construction of triplets of contacts, Morlet wavelets, calculation of higher-order SVD, calculation of SF).

      (2) Novelty:<br /> We agree with the third reviewer: if our results can convince, then the study will have an impact on the field. While there is work that has been done on phase interactions at a variety of scales, such as from the labs of Fries, Singer, Engels, Nauhaus, Logothetis and others, it does not quantify the relative power of the different spatial scales. Additionally, the research of Freeman et al. has quantified only portions of the SF spectrum of the cortex, or used EEG to estimate low SFs. We would appreciate any pointers to the specific literature the current research contributes to, namely, the SF spectrum of activity in the cortex.

      (3) Further analyses:<br /> The main results of the research are relatively simple: monotonically falling SF-power with SF; this effect occurs across the range of temporal frequencies. We provide each individual participant’s curves in the supplementary Figures. By visual inspection, it can be seen that the main result of the example participant is uniformly recapitulated. One is rarely in this position in neuroscience research, and we will make this explicit in the text.

      The research stands or falls by the adequacy of the method to estimate the SF curves. For this reason most statistical analyses and figures were reserved for ruling out confounds and exploring the limits of the methods. However, for the sake of completeness, we will now include the SF vs. SF-power correlations and significance in the next version, for each participant at each frequency.

      Since the main result was uniform across participants, and since we did not expect that there was anything of special significance about the delayed free recall task, we conclude that more participants or more tasks would not add to the result. As we point out in the manuscript, each participant is a test of the main hypothesis. The result is also consistent with previous attempts to quantify the SF spectrum, using a range of different tasks and measurement modalities (Barrie et al., 1996; Ramon & Holmes 2015; Alexander et al., 2019; Alexander et al., 2016; Freeman et al., 2003; Freeman et al. 2000). The search for those rare sEEG participants with larger coverage than the maximum here is a matter of interest to us, but will be left for a future study.

      (4) Sampling of phase and its meaningfulness:<br /> The wavelet methods used in the present study have excellent temporal resolution but poor frequency resolution. We additionally oversample the frequency range to produce visually informative plots (usually in the context of time by frequency plots, see Alexander et al., 2006; 2013; 2019). But it is not correct that the methods for estimating phase assume a narrow frequency band. Rather, the poor frequency resolution of short time-series Morlet wavelets means the methods are robust to the exact shape of the waveforms; the signal need be only approximately sinusoidal; to rise and fall. The reason for using methods that have excellent resolution in the time-domain is that previous work (Alexander et al., 2006; Patten et al. 2012) has shown that traveling wave events can last only one or two cycles, i.e., are not oscillatory in the strict sense but are non-stationary events. So while short time-window Morlet wavelets have a disadvantage in terms of frequency resolution, this means they precisely do not have the problem of assuming narrow-band sinusoidal waveforms in the signal. We strongly disagree that our analysis requires very strong assumptions about oscillations (see last point in this section).

      Our hypothesis was about the SF spectrum of the phase. When the measurement of phase is noise-like at some location, frequency and time, then this noise will not substantially contribute to the low SF parts of the spectrum compared to high SFs. Our hypothesis also concerned whether it was reasonable to interpret the existing literature on low SF waves in terms of cortically localised waves or small numbers of localised oscillators. This required us to show that low SFs dominate, and therefore that this signal must dominate any extra-cranial measurements of apparent low SF traveling waves. It does not require us to demonstrate that the various parts of the SF spectrum are meaningful in the sense of functionally significant. This has been shown elsewhere (see references to traveling waves in manuscript, to which we will also add a brief survey of research on phase dynamics).

      The calculation of phase can be bypassed altogether to achieve the initial effect described in the introduction to the methods (Fourier-like basis functions from SVD). The observed eigenvectors, increasing in spatial frequency with decreasing eigenvalues, can be reproduced by applying Gaussian windows to the raw time-series (D. Alexander, unpublished observation). For example, undertaking an SVD on the raw time-series windowed over 100ms reproduces much the same spatial eigenvectors (except that they come in pairs, recapitulating the real and imaginary parts of the signal). This reproducibility is in comparison to first estimating the phase at 10Hz using Morlet wavelets, then applying the SVD to the unit-length complex phase values.

      (5) Other issues to be addressed and improved:<br /> clarity on which experiments were analyzed (starting in the abstract) discussion of frequencies above 60Hz and caution in interpretation due to spike-waveform artefact or as a potential index of multi-unit spiking discussion of whether the ad hoc, quasi-random sampling achieved by sEEG contacts somehow inflates the low SF estimates

      References (new)<br /> Patten TM, Rennie CJ, Robinson PA, Gong P (2012) Human Cortical Traveling Waves: Dynamical Properties and Correlations with Responses. PLoS ONE 7(6): e38392. https://doi.org/10.1371/journal.pone.0038392<br /> Margrave GF, Ferguson RJ (1999) Wavefield extrapolation by nonstationary phase shift, GEOPHYSICS 64:4, 1067-1078<br /> Ying Y (2009) Sparse Fourier Transform via Butterfly Algorithm SIAM Journal on Scientific Computing, 31:3, 1678-1694

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K+. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same timecourse as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and to consider the reviewer 2’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

      Finally, when [K+]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes also appear to influence T2 changes. Our ongoing study shows that there are differences in T2 changes (for the same volume changes) between two different situations: pure osmotic volume changes vs. [K+]-induced volume changes (e.g., hypoosmotic vs. depolarization). Furthermore, this study suggests that mechanisms such as changes in free (primarily intracellular) and bound water within a voxel play an important role in generating this T2 difference. Our group is preparing a manuscript for this follow-up study and will report on it shortly.

      So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T2 and PSR) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      There are a few smaller issues that should be addressed.

      (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      We appreciate the reviewer’s suggestion regarding imaging sequences. We would like to clarify that dictionaries were used for fitting in vivo T2 decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T2 maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interests while balancing scan time constraints.

      (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      The T2 decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T2 decay curve using the technique developed by McPhee and Wilman (2017).

      (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We will clearly describe the imaging slice in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We will clarify this point in the revised manuscript to avoid any misunderstanding.

      (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      As requested by the reviewer, we will include the absolute values in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K+ and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and to consider the reviewer’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Molnar, Suranyi and colleagues have probed the genomic stability of Mycobacterium smegmatis in response to several anti-tuberculosis drugs as monotherapy and in combination. Unlike the study by Nyinoh and McFaddden http://dx.doi.org/10.1002/ddr.21497 (which should be cited), the authors use a sub-lethal dose of antibiotic. While this is motivated by sound technical considerations, the biological and therapeutic rationale could be further elaborated.

      In the mutation accumulation experiments, we needed to ensure continuous and reproducible growth of a small number of colonies across multiple passages. This technical requirement necessitated the use of sublethal drug concentrations. However, sublethal doses also have biological relevance. Noncompliance with prescribed antibiotic regimens and the presence of antibiotic residues in food due to the extensive use of antibiotics in agricultural mass production are two obvious sources of prolonged exposure to sublethal antibiotics.

      The results the authors obtain are in line with papers examining the genomic mutation rate in vitro and from patient samples in Mycobacterium tuberculosis, in vitro in Mycobacterium smegmatis and in vitro in Mycobacterium tuberculosis (although the study by HL David (PMID: 4991927) is not cited). The results are confirmatory of previous studies.

      The two cited studies, along with several others, did not distinguish between genetic mutations and phenotypic responses to drug exposure (the fluctuation test alone is not suitable for this). Therefore, their objectives are not comparable to ours, which specifically investigated whether resistant colonies carry adaptive mutations. Nevertheless, we acknowledge the relevance of these studies and have now cited them in the appropriate sections in the text.

      It is therefore puzzling why the authors propose the opposite hypothesis in the paper (i.e antibiotic exposure should increase mutation rates) merely to tear it down later. This straw-man style is entirely unnecessary.  

      The phenomenon of stress-inducible mutagenesis in bacterial evolution remains a topic of heated debate. The emergence of genetically encoded resistance may stem from either microevolution or the dissemination of pre-existing variants from polyclonal infections under drug pressure. We believe that the Introduction presents both of these hypotheses in a balanced manner to elucidate the rationale behind our mutation accumulation investigations.  

      The results on the nucleotide pools are interesting, but the statistically significant data is difficult to identify as presented, and therefore the new biological insights are unclear.

      We now indicate statistical significance in the figure, in addition to the detailed statistical analysis of all dNTP measurements provided in Table S5.

      Finally, the authors show that a fluctuation assay generates mutations with higher frequencies that the genetic stability assays, confirming the well-known effect of phenotypic antibiotic resistance.

      What we show is that the fluctuation assay generated bacteria that tolerated the applied antibiotic without developing mutations. Conclusions about mutation rates are often drawn from fluctuation assays without confirming genetic-level changes, a discrepancy that persists despite these assays accounting for both phenotypic and genotypic alterations. By combining genome sequencing with fluctuation assays, our approach emphasizes the importance of distinguishing between these changes. While fluctuation assays remain valuable, inexpensive, and simple tools for evaluating the response of bacterial populations to various selective environments, they should not be considered definitive indicators of genetic changes.

      Recommendations For The Authors:

      The quality of the figures can be significantly improved. In Figure 1, cell lengths can be shown on separate histograms or better still as violin plots to enable better comparisons.

      Thank you for the suggestion. We have revised the data presentation accordingly.

      Details for statistical tests should be provided in the figure legend.  

      Statistical details are now added in the figure legend.

      In Figure 2, the number of data points is not mentioned.

      Statistical information is now added to the new Figure 2, which has been revised extensively based on suggestions from all Referees.

      The data in Figure 3 would be much easier to comprehend as a heatmap.  

      The figure we provided is a color gradient table representing different gene expression levels, along with numerical data and statistical significance indicated within the color boxes, expanding the information content of a traditional heatmap. In response to the Referee's suggestion, we also prepared a hierarchical clustering heatmap, demonstrating that the grouping of rows and columns based on functional information in the original figure is consistent with the clustering pattern observed in the heatmap (Figure S5). As the original figure is more informative and better structured, we have included the new figure in the supplementary materials.

      No statistical tests are provided for Figure 4.

      We now indicate statistical significance in the figure and describe the statistical analysis in the figure legend, as suggested. Additionally, Table S5 is dedicated to the statistical analysis of the dNTP data.  

      Reviewer #2 (Public Review):

      In this study, the authors assess whether selective pressure from drug chemotherapy influences the emergence of drug resistance through the acquisition of genetic mutations or phenotypic tolerance. I commend the authors on their approach of utilizing the mutation accumulation (MA) assay as a means to answer this and whole genome sequencing of clones from the assay convincingly demonstrates low mutation rates in Mycobacteria when exposed to sub-inhibitory concentrations of antibiotics. Also, quantitative PCR highlighted the upregulation of DNA repair genes in Mycobacteria following drug treatment, implying the preservation of genomic integrity via specific repair pathways.

      Even though the findings stem from M. smegmatis exposure to antibiotics under in vitro conditions, this is still relevant in the context of the development of drug resistance so I can see where the authors' train of thought was heading in exploring this. However, I think important experiments to perform to more fully support the conclusion that resistance is largely associated with phenotypic rather than genetic factors would have been to either sequence clones from the ciprofloxacin tolerance assay (to show absence/ minimal genetic mutations) or to have tested the MIC of clones from the MA assay (to show an increase in MIC).

      Thank you for acknowledging the values of the manuscript and for the insightful suggestions for improvement. We agree on the necessity to directly connect the mutation accumulation experiments with the tolerance assay, and we have performed both suggested additional experiments.  

      (1) We repeated the ciprofloxacin tolerance assay (Figure S6) using a large number of plates to gather enough cells for genomic DNA extraction and whole genome sequencing. The sequencing confirmed the absence of mutations in bacteria grown in both 0.3 and 0.5 ug/ml ciprofloxacin. We integrated this result in the revised manuscript text, while the sequencing data are available at the European Nucleotide Archive (ENA) with PRJEB71590 project number.

      (2) We resuscitated three different clones from the MA assays stored at -80°C and tested the MIC of the respective drugs. The results are presented in Figure 2C. Except for EMB, we observed an increase in MIC values across the treatments.

      There seems to be a disconnect between making these conclusions from experiments conducted under different conditions, or perhaps the authors can clarify why this was done.  

      Molecular biology analysis methods are not easily compatible with long-term mutation accumulation experiments, or at least we could not establish the necessary conditions. When DNA or RNA extraction was required, we had to adjust the experimental scale for further analysis, which could be done in liquid culture. We believe that the suggested critical back-and-forth control experiments have significantly improved the comparability of the results.

      With regards to the sub-inhibitory drug concentration applied, there is significant variation in the viability as calculated by CFUs following the different treatments and there is evidence that cell death greatly affects the calculation of mutation rate (PMCID: PMC5966242). For instance, the COMBO treatment led to 6% viability whilst the INH treatment led to 80% cell viability. Are there any adjustments made to take this into account?

      We agree with and have been aware of the notion that cell death affects the calculation of the mutation rate. We included treatment optimization data on agar plates (Table 1 and Figure S2), which now demonstrate that the applied subinhibitory drug concentrations resulted in ≤10% viability across all treatments in the MA assay. This minimizes the potential discrepancy in the mutation rate calculation caused by variable cell death.  

      It would also be useful to the reader to include a supplementary table of the SNPs detected from the lineages of each treatment - to determine if at any point rifampicin treatment led to mutations in rpoB, isoniazid to katG mutations, etc.  

      Overall, while this study is tantalizingly suggestive of phenotypic tolerance playing a leading role in drug resistance (and perhaps genetic mutations a sub-ordinate role) a more substantial link is needed to clarify this.

      The SNPs identified from the lineages of each treatment are compiled in the 'unique_muts.xls' file within the Figshare document bundle that was originally enclosed with the manuscript. In response to your suggestion, we have now added a simplified version of this data set in Table S2, listing the detected SNPs. Notably, no confirmed adaptive mutation developed in our experiments; rifampicin treatment did not result in mutations in rpoB, nor did isoniazid lead to mutations in katG.

      Recommendations For The Authors:

      I would suggest moving Figure 1 to the supplementary - it shows that cell wall targeting drugs cause cell shortening and DNA replication targeting drugs cause cell elongation as would be expected and this is simply a secondary observation, not one that is central to the paper.  

      We agree that this is not a novel or unexpected observation. However, we used it as an indicator of drug effectiveness, particularly for bacteriostatic cell wall-targeting drugs in liquid culture that induced moderate cell death. Following Reviewer 1's suggestions, we extensively revised the figure to better convey our intended message. We believe the updated version now more clearly demonstrates the drugs' impact, and for this reason, we have opted to keep it in the main text.

      Figure 2 and Table 2 show the same data so this can be combined as a paneled figure or one moved to the supplementary. It would be useful to include a diagram of how the MA assay was conducted, similar to the CIP tolerance assay figure.

      Thank you for the suggestions. We have added a diagram to Figure 2 explaining the MA assay (Figure 2A), as well as the MIC experiment conducted on the MA cells (Figure 2C). To avoid redundancy, Table 2 has been removed.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes how antibiotics influence genetic stability and survival in Mycobacterium smegmatis. Prolonged treatment with first-line antibiotics did not significantly impact mutation rates. Instead, adaptation to these drugs appears to be mediated by upregulation of DNA repair enzymes. While this study offers robust data, findings remain correlative and fall short of providing mechanistic insights.

      Strengths:

      The strength of this study is the use of genome-wide approaches to address the specific question of whether or not mycobacteria induce mutagenic potential upon antibiotic exposure.

      Weaknesses:

      The authors suggest that the upregulation of DNA repair enzymes ensures a low mutation rate under drug pressure. However, this suggestion is based on correlative data, and there is no mechanistic validation of their speculations in this study.

      Furthermore, as detailed below, some of the statements made by the authors are not substantiated by the data presented in the manuscript.

      Finally, some clarifications are needed for the methodologies employed in this study. Most importantly, reduced colony growth should be demonstrated on agar plates to indicate that the drug concentrations calculated from liquid culture growth can be applied to agar surface growth. Without such validations, the lack of induced mutation could simply be due to the fact that the drug concentrations used in this study were insufficient.

      Thank you for appreciating the manuscript's merits and for the instructive suggestions. We agree that demonstrating reduced colony growth on agar plates is important to validate the relevance of the drug concentrations used in the study. In response, we have added the treatment optimization data on agar plates in Figure S2 and reorganized Table 1 to show the decrease in CFU achieved with the applied subinhibitory drug concentrations.

      We acknowledge that the observed upregulation of DNA repair enzymes and the low mutation rates under drug pressure represent correlative data. We removed the reference to mechanism from the abstract and avoided presenting the qPCR results as a mechanistic explanation in the text. We have only raised the possibility that correlation could be a causal relationship: "The observed upregulation of the relevant DNA repair enzymes might account for the low mutation rate even under drug pressure." We recognize the necessity for a new series of targeted experiments to provide mechanistic explanations. We added the following text to the Discussion:

      “The observed activation of DNA repair processes likely mitigates mutation pressure, ensuring genome stability. However, to confirm this hypothesis, these investigations should be conducted using genetically modified DNA repair mutant strains.”

      In the current manuscript, we aim to convincingly demonstrate that long-term antibiotic pressure did not induce the occurrence of new adaptive mutations.

      Recommendations For The Authors:

      Additional specific comments are:

      Page 2. Do not italicize "Mycobacteria", which is not considered a scientific name.

      Corrected.

      Page 4. "Bacto pepcone" is a typo.

      Corrected.

      Page 6. "Quiagen" is a typo.

      Corrected.

      Page 9. In Table 1, RIF being described as a protein synthesis inhibitor is misleading.

      Corrected.

      Page 9. The statement "Specifically, following RIF, CIP, and MMC treatments, we observed cells elongating by more than twofold, whereas INH and EMB treatments led to a reduction in cell length." cannot be justified by Figure 1, as the cell length information is not conveyed in this figure.

      Thank you for pointing this out, the revised Figure 1 conveys the cell length information.

      Page 10. If the experiment shown in Figure S1 was done in an acidic growth condition, the figure legend should clearly indicate the fact. Additionally, the assay condition should be described in detail in the Methods section.

      Thank you, the required information is now included in both the figure legend and the Methods section.

      Page 10. If PZA does not work against M. smegmatis, it seems pointless to add it to the COMBO treatment. Please clarify why it was included in the drug combination experiment.

      We added the following text to clarify the use of PZA: “Regardless of its inefficacy as a monotherapy, we included PZA in the combination treatment, as we could not rule out the possibility that PZA interacts with the other three drugs or that PZA elimination mechanisms are equally active in M. smegmatis under this regimen.”

      Page 10. Generation times calculated from liquid culture cannot be applied to colony growth on an agar plate. The growth behaviors on a solid surface will be totally different from planktonic suspension growth. The numbers of generations indicated here will be inaccurate.

      You are absolutely right. We conducted an experiment to calculate the number of generations on plates under the same conditions as used in the MA assay. We found, indeed, a different (doubled) generation time from what was determined in liquid culture. We have adjusted the mutation rates accordingly.

      Page 12. Was the experiment shown in Figure 3 done in a liquid culture? If so, the transcriptional profile could be different from the experiment shown in Figure 2, which was done on an agar plate.

      Yes, the experiment shown in Figure 3 was conducted in liquid culture. We acknowledge that the transcriptional profile could differ from the experiment shown in Figure 2, which was performed on an agar plate. However, technical limitations required us to use liquid cultures for these experiments.

      Page 14. Regarding the statement "INH and EMB coincided with a decreased concentration of these [dCTP and dTTP] nucleotides", by examining Table S5, I do not see any statistical reductions in dCTP and dTTP levels.

      Thank you for bringing this to our attention. We have made the necessary corrections to ensure that the text and data are now aligned.

      Page 14. Similarly to the comment above, the statement "RIF, CIP and MMC treatments promoted an increase in the dCTP and dTTP pools" is misleading as each drug seems to increase either dCTP or dTTP, not both.

      Same as above.

      Page 14. The authors state, "a larger overall dNTP pool size coincides with a larger cell size and vice versa (Figure 4H)". Please indicate the unit of the pool size for the graph shown in Figure 4H. According to the legend, I assume that it refers to the concentration. The term "pool size" may be misleading as it implies quantity rather than concentration.

      Page 15. Figure 4H is impossible to understand. The left y-axis label looks as if it is a ratio of cell length to volume. There is no point in having these three data on a single graph. Please separate them into individual graphs. Also, what is the spacing between the tick marks? The data also seem inconsistent with the values given in Table S1. For example, the mean volume of COMBO is larger than the control (according to Table S1), and yet the graph in Figure 4H indicates that COMBO's relative length is less than 1.

      Thank you for your feedback. We have corrected these and created what we hope is a clearer figure.

      Figure S1. Clarify what the gray shade in the graph represents.

      The gray shade was unnecessary, so we removed it when recoloring the figure to ensure a more coherent color scheme across the different treatments.

      Figure S1. Relative viability cannot be determined by OD600. CFU needs to be determined to assess cell viability.

      Thank you. We changed the incorrect term viability to growth inhibition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes the induction of SIV-specific NAb responses in rhesus macaques infected with SIVmac239, a neutralization-resistant virus. Typically, host NAb responses are not detected in animals infected with SIVmac239. In this work, seventy SIVmac239-infected macaques were retrospectively screened for NAb responses and a subset of nine animals were identified as NAb-inducers. The viral genomes from 7/9 animals that induced NAb responses were found to encode nonsynonymous mutation in the Nef gene (amino acid G63E). In contrast, Nef G63E mutation was found only in 2/19 NAb non-inducers - implicating that the Nef G63E mutation is selected in NAb inducers. Measurement of Nef G63E frequencies in plasma viruses suggested that Nef G63E selection preceded NAb induction. Nef G63E mutation was found to mediate escape from Nef-specific CD8+ T-cell responses. To examine the functional phenotype of Nef G63E mutant, its effect on downmodulation of Nef-interacting host proteins was examined. Infection of rhesus and cynomolgus macaque CD4+ T cell lines with WT or Nef G63E mutant SIV suggested that Nef mutant reduces S473 phosphorylation of AKT. Using flow cytometry-based proximity ligation assay, it was shown that Nef G63E mutation reduced binding of Nef to PI3K p85/p110 and mTORC2 GβL/mLST8 and MTOR components - kinase complex responsible AKT-S473 phosphorylation. In vitro B-cell Nef invasion and in vivo imaging/flow cytometry-based assays were employed to suggest that Nef from infected cells can target Env-specific B cells. Lastly, it was determined that NAb inducers have significantly higher Env-specific B-cells responses after Nef G63E selection when compared to NAb non-inducers. Finally, a corollary was drawn between the Nef G63E-associated B-cell/NAb induction phenotype and activated PI3K delta syndrome (APDS), which is caused by activating GOF mutations in PI3K, to suggest that Nef G63E-meidated induction of NAb response is reciprocal to APDS.

      Strengths:

      This study aims to understand the viral-host interaction that governs NAb induction in SIVmac239-infected macaques - this could enable identification of determinants important for induction of NAb responses against hard-to-neutralize tier-2/3 HIV variants. The finding that SIV-specific B-cell responses are induced following Nef G63E CD8+ T-cell escape mutant selection argue for an evolutionary trade-off between CTL escape and NAb induction. Exploitation of such a cellular-humoral immune axis could be important for HIV/AIDS vaccine efforts.

      Although more validation and mechanistic basis are needed, the corollary between PI3K hyperactive signaling during autoimmune disorders and Nef-mediated abrogated PI3K signaling could help identify novel targets and modalities for targeting immune disorders and viral infections.

      We are grateful for the supportive and insightful comments. The work did seem to unintendedly highlight a conceptual link between extrinsic and intrinsic immune perturbations. We will keep working on both wings, aiming to evoke synergisms.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that the mechanistic basis of Nef-mediated induction of NAb responses are not directly examined. For example, it remains unclear whether SIVmac239 with engineered G63E mutation in Nef would induce faster and potent NAb responses. A macaque challenge study is needed to address this point.

      We appreciate the point. We do have certain difficulties in availability of macaques for de novo experiments. As partially discussed in ver1, the identified Nef phenotype selected post-acute infection confers an enhanced CD4+ T cell-killing effect (revised Fig 4F), and it is likely that de novo infection with the mutant would redirect the trajectory of infection to rapid disease/AIDS progression accompanying generalized immune failure by boosting acute-phase CD4 destruction. In other words, mutant de novo infection may not necessarily be directly discussable as an attempt for reconstitution. It appears equally critical to understand the mutant in vitro on an immunosignaling basis, and in the current work we have focused on depicting this as the first step. We will work on reconstitution experiments with emphasis on pharmacology in our future study.

      As presented, the central premise of the paper involves infected cell-generated Nef (WT or G63E mutant) being targeted to adjacent Env-specific B cells. However, it remains unclear how this is transfer takes place. A direct evidence demonstrating CD4+ T cell-associated and/or cell-free Nef being transferred to B-cell is needed to address this concern.

      We appreciate the point, also pointed out by Reviewers 2 and 3. We have performed three sets of in vitro reconstitution experiments graphically/functionally addressing how Nef transfer from CD4+ T cells to B cells can be modulated (new Fig 6) and edited text accordingly.

      The interaction between Nef and PI3K signaling components (p85, p110, GβL/mLST8, and MTOR) has been explored using PLA assay, however, this requires validation using additional biochemical and/or immunoprecipitation-based approaches. For example, is Nef (WT or mutant form) sufficient to affect PI3K-induced phosphorylation of Akt in an in vitro kinase assay? Moreover, the details regarding the binding events of WT vs mutant Nef with PI3K signaling components is lacking in this study. Lastly, it is unclear whether the interaction of Nef with PI3K signaling components is a conserved function of all primate lentiviruses or is this SIV-specific phenotype.

      We appreciate the point. Co-immunoprecipitation analysis via pulldown with the mTORC2-intrinsic cofactor Sin1 (revised Fig 4E), showing decreased G63E-Nef binding, should confer robustness to the statement combined with initial manipulation results (Fig 4C). As Sin1 is mTORC2- and not mTORC1-intrinsic, results should be strengthened. Phosflow may be a standard readout nowadays for pAkt itself. Related with sequence variation, conservation will be addressed in studies ahead. We concisely mentioned on this in the revision (Lines 390-391).

      It has been previously reported that the region of Nef encoding glycine at position 63 is not conserved in HIV-1 (Schindler et al, Journal of Virology 2004). Thus, does HIV-1 Nef also function in induction of NAb responses in humans? or the observed phenotype specific to SIV?

      We appreciate the point, and do not have an answer at the moment. We will explore in our HIV-1-infected patient cohort (Hau et al, AIDS 2022) and other occasions whether corresponding phenotypes may exist. We have mentioned on this point in the revised manuscript (Line 392-393).

      Reviewer #2 (Public Review):

      It is well known that human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain. They identified a subgroup of animals that showed significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. They further show that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signaling. The authors propose that this induction of SIVmac239 nAb induction is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function (Ref). Altogether, the results suggest that PI3K signaling plays a key role in B-cell maturation and generation of effective nAb responses.

      Strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. Weaknesses are that only G63E and not G63R that also emerged in most animals was examined in most functional assays. Some effects of the G63E mutation seem modest and comparison to a grossly nef-defective SIVmac construct would be desirable to better assess to impact of the mutation of Nef-mediated stimulation of PI3K. While the impact of this Nef mutations on PI3K and the association with improved nAb responses is largely convincing, the results on the potential impact of soluble Nef on neighboring B cells is much less clear. SIVmac239 infects and manipulates helper CD4 T cells and these are essential for the activation and differentiation of B cells into antibody-producing plasma cells and effective humoral immune responses. Without additional functional evidence that Nef indeed specifically targets and manipulated B cells these results and conclusions should be made with much greater caution. Finally, the presentation of the results and conclusions is partly very convoluted and difficult to comprehend. Editing to improve clarity is highly recommended.

      We are very grateful for the supportive and visionary review and suggestions. Experiments have been performed to improve the points raised. This work inevitably involved interdisciplinary factors to even hit on the schematic (NAbs, B cells, CD4+T, CD8+T, viral escape, immunosignaling, IEI as extrapolation & microscopy implementations) and convoluted sections should have existed. We attempted streamlining of certain portions and edited writing throughout, and hope that it became more straightforward.

      Reviewer #2 (Recommendations For The Authors):

      As outlined in the public review, I found the results potentially very interesting but parts of the manuscript much more complex and confusing than necessary. In addition, the methods on the potential impact of soluble Nef on neighboring B cells in vivo was difficult to assess but altogether this part was not convincing. Have the following specific suggestions:

      We are very grateful for the scholarly review, and encouraging and suggestive comments on this orphan work. In the revision we designed experiments to address the properties of Nef transfer to append understanding on the in vivo B-cell data. Recommendations have been addressed as follows.

      (1) Title: "AIDS virus-neutralizing antibody induction reciprocal to a PI3K gain-of-function disease". Think this title hardly reflects the data; SIVmac cause simian AIDS and is not the "AIDS virus" the 2nd part is more appropriate for discussion than for the title (and the abstract).

      We appreciate the point. The original intent of the title was to conceptually bridge two differing fields of virus-host interaction and inborn errors of immunity/immunosignaling on an original article basis. Certain papers (Mudd et al, Nature 2012 etc) do utilize the term AIDS virus, and we similarly chose the term for simplification to non-virologists at initial submission.

      That being said, we understand the scholarly point raised, and feel that the initial aim can be well attained by retaining the key host effector PI3K in the title, as in the revised submission titled “SIV-specific neutralizing antibody induction following selection of a PI3K drive-attenuated nef variant”.

      (2) Abstract and throughout: As the authors show, SIVmac is not generally "neutralization resistant"; difficult to neutralize is more appropriate and should be used throughout. Also, the abstract and other parts are more complicated than necessary.

      We appreciate the point. HIV/SIV Env immunology work utilizes “neutralization-resistant” for SIVmac239 (e.g., Mason et al, PLoS Pathog 2016), and autologous titer positivity of ~10% at this size of examination does appear low amongst lentiviruses. Nevertheless, as recommended, “difficult-to-neutralize” better describes the nature, and we have switched the term accordingly.

      Linked with title modification, we reflected the comment on abstract structure and switched the main introductory sentence (Here we…) to a more data-based one instead of depicting extrapolation, and have modified phrasings in the latter half.

      (3) The intro seems a bit biased. Immune evasion due to mutations and proviral integration that play key roles in viral persistence are not mentioned. nAbs are not known to efficiently control HIV or SIV replication in vivo (not even in the present study). Thus, a more "balanced" presentation of the role of nAbs in vivo is desirable.

      We agree with the comment. Introduction in ver1 submission was compressed to just display humoral immune perturbation examples across persistence-prone viral infections, and indeed it should be much better to layout the multiscale strategies of lentiviruses in manifesting viral persistence. We have appended two sets of texts, one on the fundamental integrating retroviral life cycle and another on the wide spectrum of accessory protein-driven perturbation. As pointed out, the current endogenous induction is of course not early enough to exert suppressive impact on replication as like in exogenous Ab passive infusions. We have accordingly modulated text to improve the balance.

      (4) Lines 73-76: rephrase for clarity.

      We acknowledge the comment and have rephrased accordingly.

      (5) Line 92: "linked with sustained Env-specific B-cell responses after the mutant Nef selection". After or during in one case; the time frame varies enormously and this should be discussed.

      We appreciate the comment. The six Nef-G63E mutant-selecting NAb inducers subjected to B-cell analysis were the ones that showed precedence in Fig 2D (mutant before induction). That being said, we modified text as suggested (Line 104 in revised uploaded text). Text related to temporal deviation has been appended (Lines 378-383 in revised uploaded text).

      (6) The authors should discuss G63R and include it in the functional analyses.

      We appreciate the comment. Discussion on Nef-G63R in ver1 submission was kept minimal because statistical significance for selection was marginal. We generated a Nef-G63R mutant and results are appended in Fig 4-Figure Supplement 2.

      (7) Lines 124/5: conservation only applies to SIVsmm/mac Nefs and this region is also frequently deleted/length-variable in primary HIV-1 Nefs.

      We appreciate the comment. We modified description of the region accordingly (Lines 139-141 in revised text).

      (8) Lines 153-155: Statement doesn't seem to make sense. The triple mutant Nef SIVmac construct was not attenuated for replication but specifically disrupted in CD3 down-modulation.

      We acknowledge the comment. It had meant that the consequent plasma viral load showed a trend of decrease (as in the Graphical Abstract of the work) which should (in a simplistic view) influence antigenicity for humoral immune responses. Yet it is very true that virological replicative capacity was comparable with wild-type as in Fig.1. We have taken down the related text and rephrased it (Ref remains cited in introduction).

      (9) Lines 178/9: levels in PI3K gain-of-function mice "with full disease phenotype (Avery et al., 2018)". This needs more information, e.g. what disease exactly are they talking about?

      We are grateful for the correction, and have appended text and introduced the mentioned congenital disease in the Introduction section in advance. In-detail description is also appended in the Discussion section.

      (10) Lines 186/7: "Env-stimulating high-MOI infection also accelerated phenotype appearance, with enhanced 50% reduction (Figure 4C, right)". Modify text and corresponding figure for clarity.

      We acknowledge the comment. We revised as: “A high-MOI SIV infection, comprising higher initial concentration of extracellular Env stimuli, also accelerated phenotype appearance from day 3 to day 1 post-infection with stronger pAkt reduction”.

      (11) The validity of the results described in the section "Targeting of lymph node Env-specific B cells by Nef in vivo" was difficult to assess. Altogether, however, I didn't find them convincing, especially since a negative control (e.g. macaques infected with nef-deleted SIVmac) are missing.

      We acknowledge the comment. As a pure experimental control, whole-Nef deletion may assist for subtracted baselines. Within this work, the staining per se at least should be highly specific (mAb multiply verified in other applications and cytometry panel also designed for minimal spillover into AF488 channel). On in vivo basis, direct comparison may be somewhat frustrated by the fact that reduction in other pleiotropic effects of Nef seem to more dominate upon Nef deletion, as a set of reduced viremia, robust CD8 responses, killer CD4 responses and increased binding Ab titers (Johnson et al, J Virol 1997, Gauduin et al, J Exp Med 2006, Fukazawa et al, Nat Med 2012, Adnan et al, PLoS Pathog 2016 etc) leading to altered trajectory. We promise that we will work on refinement of the methodology in studies ahead.

      (12) Lines 309-319: This paragraph made little sense to me (as did lines 328-331).

      We acknowledge the comment and have edited both sections.

      Reviewer #3 (Additional Reviewer):

      In this manuscript, Hiroyuki Yamamoto et al examined virus-specific antibody responses and identified a subgroup of nine individuals, out of seventy SIVmac239 rhesus macaques of Burmese origin infected with SIVmac239, that develop neutralizing antibodies (NAb). The authors propose the emergence of a nef mutant (Nef-G63E) that impacts on B cell maturation resulting in PI3K gain-of-function.

      My major concerns are:

      The authors by different aspect addressed the role of the emergence of Nef-G63E mutant in individuals developing NAb. The manuscript is confused and the rational not always clearly stated. This reflects the two aspects of the manuscript (i) NAb identification in a subgroup of macaque and (2) the identification this nef mutation.

      We are grateful for the comprehensive and scholarly comments. As pointed out, the work did need to confront potential bifurcation of the influence of the obtained viral immunosignaling phenotype for CD4-intrinsic (which might be your specialty) and B-cell-intrinsic impact. Based on your suggestions we have acquired additional data and revised the manuscript as attached.

      The authors used both males (n=57) and females (n=13). However, there is no indication related to the sex regarding NAb inducers versus non-NAb Inducers. The notion of "highly pathogenic" is certainly not correct (see the introduction). Pathogenicity is also depending on monkey origin. Thus, cynomolgus are less sensitive to SIVmac239 or SIVmac251 compared to rhesus macaques (Ling B Aids 2002; Reimann KA, J Virol 2005; Cumont MC, J Virol 2008), or to pigtails used in US. Indeed, the authors used Burmese macaques, and therefore the dynamics of pathogenicity is different to rhesus macaque (Indian origin) housed in US. How many animals have been sacrificed out of the 61 animals? Herein, the animals are surviving longer (more than one year), and therefore the notion of "highly pathogenic" merits to be modulated.

      We appreciate the comment. We have accordingly appended sex information (M/F: 8/1 versus 49/12 in NAb inducers vs non-inducers, p > 0.99 by Fisher’s exact test) in the methods section. As pointed out there are differences in the frequency and rate of AIDS progression among macaques of differing origin, whereas we have also previously reported reproducible AIDS progression dependent on MHC-I genotypes in the Burmese rhesus macaques utilized (Nomura, Yamamoto et al., J Virol 2012). Adhering to advice, we have attenuated the term to “pathogenic” in the revised manuscript and appended one reference showing pathogenesis gradation from a cell-death perspective (Cumont 2008).

      Furthermore, no indication is provided regarding CD4 T cell dynamics, or CD8 T cells. In particular, the extent of T cell immunodeficiency may compromise humoral response. Therefore, this data needs to be shown. Indeed, previous reports have indicated that early CD4 T cell depletion is associated with defective humoral response. Furthermore, Tfh cell depletion was reported in several immune tissues, which are essential for B cell immune response like the spleen. Thus, this should be discussed as an alternative mechanism to the absence of NAb. Indeed, the authors found higher and persistent env-specific plasmablast cells in NAb inducers than that observed in non-NAb inducers figure 6. Why to have selected twelve individuals out of 61 individuals for assessing anti-env response (Supplemental S3 for figure 1, panel 1), and only eleven for western blots. The explanation in the text is absent. This requires to be clearly stated. See lines 108-110.

      We appreciate the comment. As in other sections, this study utilized available cryopreserved samples from a retrospective cohort, also having heterogeneity in data acquisition along the way. We acknowledge that some supplemental data are particularly limited in information, which is also a reason they are presented in SI. We felt that one important core was to secure samples for Nef-G63E-selecting NAb inducers versus viremic non-inducers, for which we acquired six versus twelve in the B-cell analysis.

      We (Nakane et al, PLoS ONE 2013) and others (Hirsch et al, J Virol 2004) have already reported on western blotting-basis that SIV-infected rapid progressors tend to manifest serological failure (impaired binding Ab-WB bands). Therefore, to compare quantitative traits at this basal stage (Fig 1), we judged that NAb inducer comparison with more non-rapid-progressing (>60 wk survival) non-inducers would be a criterion. We have mentioned on this in the revised manuscript (results/methods). Additionally, we have replaced the immunoblotting result with one more non-inducer (n = 12) to enhance results. Please note that there are lot deviations in strip-coated antigen (e.g., gp160) but the result is comparable (now covers 12/13 of animals with >60-wk survival).

      The authors indicated the frequencies of Nef-G63E mutant in figure 2 panel C. However. no information is indicated in the legend about the number of NAb non-inducers used to calculate this frequency. The authors indicated line 127, "only in two of the nineteen NAb non-inducers, including one rapid progressor". Thus, different numbers of individuals are used through the manuscript. For the readers, this is clearly a statement that needs to be clarify and to refer to what. This is not homogeneous along the text and the analyses performed.

      We appreciate the comment, and have appended the number in the revised Fig 2C. As aforementioned, heterogeneity of sample number in different sections is indeed a limitation of the work, and have mentioned this in the Discussion.

      The rational related to the sentence lines 140-142. Please clarify.. "NAb induction is not associated with these MHC-I genotypes (P = 0.25 by Fisher's exact test, data not shown) but with the Nef-G63E mutation itself".

      We appreciate the comment. We have rephrased it as:

      “Ten of nineteen NAb non-inducers also had either of these alleles (Figure 1-figure supplement 1). This did not significantly differ with the NAb inducer group (P = 0.25 by Fisher’s exact test, data not shown), indicating that NAb induction was not simply linked with possession of these MHC-I genotypes but instead required furthermore specific selection of the Nef-G63E mutation.” (Lines 159-162).

      In supplemental figure 3, only 7 individuals have been tested, while the authors indicated "Ten of nineteen NAb non-inducers also had either of these alleles". Why only seven? In NAb Burmese monkeys, the authors indicate specific T cells capable to recognize WT nef peptide, but not G63E peptide mutant. Thus, nef is immunogenic in vivo generating T cells despite to be mutated.

      In contrary, non-NAb-inducers demonstrate the absence of nef specific T cells (supplemental figure 3, excepted R01-011 panel A). Although, the authors propose an escape mutant for CD8 T cells, this is not associated with the absence of immunogenicity and not with a difference in viral load in comparison to NAb inducers (panel C). Therefore, the conclusions merit to be revised. Thus, this part of the manuscript is confusing. Please clarify the rational to link NAb and Nef specific CD8 T cells.

      We appreciate the comment. 7 out of 8 non-inducers positive for the allele and not selecting for the Nef-G63E mutant was available for analysis. The relative contribution of this single Nef62-70 epitope-specific CTL response is speculated not to be largely impacting viral control, among the many induced. This is basally discussed in a previous paper (Nomura, Yamamoto et al., J Virol 2012), more suggestive of an MHC-I haplotype-level correlation with plasma viral load. We assume that the CTL pressure-driven selection of Nef-G63E mutant was a rather pure immunosignaling trigger under persistent viremia. We appended this in the revised text (Line 172).

      In the next part of the manuscript, the authors assessed the function of this Nef-G63E mutant. The rational to introduce Ferritin in this part of the document is not clear for the reader. Furthermore, a subgroup for each (NAb+ versus NAb-) is shown: 4 for NAbneg versus 6 for NAbpos.

      We appreciate the point. As introduced, Swingler et al Cell Host Microbe 2008 reported HIV-infected macrophage-derived ferritin as a potentially B cell-disrupting factor. In that paper, viral load, ferritin and binding antibody titers positively correlated. Current data shows that SIVmac239-specific NAb induction is distinct from such kinetics already versus viral load (Fig 3-Supplement 1C), and ferritin levels were measured for some available samples more simply for confirmation. We appended three more available samples in the NAb- group. (The six NAb+/G63E animals correspond to the ones with B-cell data in Figure 7.) Statistical results appear unaffected and robust, as shown in this version. The revised manuscript incorporates appended explanation for the former.

      Similarly, whereas the authors observed a role of nef mutant on pAkt Ser473 (less induced) in comparison to WT, the authors suggest that this may have an impact on T cell survival.

      We appreciate the point. In the first submission we obtained peripheral memory Tfh decrease, whereas it is true that this is indirect. In the current revision we have addressed apoptotic cell death, shown to increase with Nef-G63E mutation (Figure 4F).

      The rational to analyze CXCR3-CXCR5+PD-1+ memory follicular Th (Tfh) is not clear. Moreover, the references used are not the adequately cited. Indeed, these papers show an expansion. See the literature for a depletion (Xu H, J Immunol. 2015; Moukambi F, PLoS Pathog. 2015; Yamamoto T, Sci Transl Med. 2015; Xu H, J Immunol. 2018 Moukambi F, Mucosal Immunol. 2019).

      We appreciate these points on in vivo CD4+ T cells.

      Peripheral memory Tfh was reported to correlate with Ab cross-reactivity in one human cohort (Locci et al, Immunity 2013) and we concisely examined the subset in the current NAb induction. We mentioned this in the revised manuscript.

      Moukambi F et al, PLoS Pathog 2015 & Mucosal Immunol 2019 are demonstrative work on acute-phase destruction. We have cited non-neonatal/vaccine-related ones suggested, including these two, in the revised manuscript. The biphasic dysregulation of Th (acute-phase destruction and chronic-phase adverse hyper-expansion) may indeed have a unique role with the current phenotype, which is beyond aim of the current analysis. We have concisely mentioned on this in the Discussion.

      Then, the authors assess the potential B-cell-intrinsic influence of the G63E-Nef phenotype. The rational here is clearly indicated, making sense with figure 1. Furthermore, this part is clearer. The dot-plots merit to be revised and the markers used better stated. The authors indicate that Nef invasion upregulates pAkt Ser473 assuming aberrant PI3K/mTORC2 signaling. What is the impact of Nef-G63E mutant on pAkt Ser473 using in vitro model of transfer. This is not addressed for comparison.

      We appreciate the remarks/suggestions, also pointed out by Reviewers 1 and 2. We have performed three sets of in vitro reconstitution experiments visually and functionally addressing how Nef transfer to B cells can be modulated (new Fig 6), and edited text accordingly.

      Minor points are:

      - the presence of references in the legend.

      -some Ab clones are in the table, however they are not used such CD38 and CD138, which are well known to be non-valid B cell markers for monkeys."

      We appreciate the suggestions.

      Mentioning on reference have been removed from the legend (Fig.1, Fig. 3) and moved to the corresponding Methods section (Fig. 1).

      We also understood this well in advance (CD38/CD138), and incorporated them in the memory B-cell panel just to check whether they ever behave in a specific pattern. As expected, no notable behavior was observed in these NAb inducers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the effects of NFKB2 mutations on pituitary gland development through hypothalamic-pituitary organoids. The evidence supporting the main conclusions is solid, although analysis of additional clones to exclude inter-clone variability would strengthen the conclusions. Insight into the mechanism of action of NFKB2 during pituitary development is incomplete. This work will be of interest to endocrinologists and biologists working on pituitary gland development and disease.

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form iPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      Revised text: “Conversely, a limitation of this model is the long duration of the differentiation period (approximately 3 months) and the fact that not all hiPSC clones lead to full differentiation of hypothalamo-pituitary organoids despite similar conditions of culture. For these reasons, we could not include confirmation of our results on an independent clone in the present paper.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      NFKB mutations are thought to be one of the causes of pituitary dysfunction, but until now they could not be reproduced in mice and their pathomechanism was unknown. The authors used the differentiation of hypothalamic-pituitary organoids from human pluripotent stem cells to recapitulate the disease in human iPS cells carrying the NFKB mutation.

      Strengths:

      The authors achieved their primary goal of recapitulating the disease in human cells. In particular, the differentiation of the pituitary gland is closely linked to the adjacent hypothalamus in embryology, and the authors have again shown that this method is useful when the hypothalamus is suspected to be involved in pituitary abnormalities caused by genetic mutations.

      Weaknesses:

      On the other hand, the pathomechanism is still not fully understood. This study provides some clues to the pathomechanism, but further analysis of NFKB expression and experiments investigating the relevant factors in more detail may help to clarify it further.

      We thank this reviewer for acknowledging that we've reached our primary objective, in particular the fact that the HPO (hypothalamo-pituitary organoid) model allows recapitulation of the disease in human cells, including hypothalamic-pituitary interactions. Regarding the pathophysiological mechanism of the disease, we must admit that it remains incompletely understood. However, we have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #2 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      DAVID syndrome is a rare autosomal dominant disorder characterized by variable immune dysfunction and variable ACTH deficiency. Nine different families have been reported, and all have heterozygous mutations in NFKB2. The mechanism of NFKB2 action in the immune systems has been well-studied, but nothing is known about its role in the pituitary gland.

      The DAVID mutations cluster in the C-terminus of the NFKB2 and interfere with cleavage and nuclear translocation. The mutations are likely dominant negative, by affecting dimer function. ACTH deficiency can be life-threatening in neonates and adults, thus, understanding the mechanism of NFKB2 action in pituitary development and/or function is important.

      The authors use CRISPR/Cas gene editing of human iPSC-derived pituitary-hypothalamic organoids to assess the function of NFKB2 and TBX19 in pituitary development. Mutations in TBX19 are the most common, known cause of pituitary ACTH deficiency, and the mechanism of action has been studied in mice, which phenocopy the human condition. Thus, the TBX19 organoids can serve as a positive control. The Nfkb2<Lym1/Lym1> mouse model has a p.Y868* mutation that impairs cleavage of NFKB2 p100, and the immune phenotype mimics the patients with DAVID mutations, but no pituitary phenotype was evident. Thus, a human organoid model might be the only approach suitable to discover the etiology of the pituitary phenotype.

      Overall, the authors have selected an important problem, and the results suggest that the pituitary insufficiency in DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. The use of gene editing in human iPSC-derived hypothalamic-pituitary organoids is significant, as there is only one example of this previously, namely studies on OTX2. Only a few laboratories have demonstrated the ability to differentiate iPSC or ES cells to these organoids, and the authors have improved the efficiency of differentiation, which is also significant.

      The strength of the evidence is excellent. However, the two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones makes the conclusions less compelling. Since the authors obtained two independent clones for NFKB2 it is not clear why only one clone was studied.

      We experienced difficulties obtaining an hiPSC population devoid of spontaneous differentiation while purifying this second clone, and did not want to delay the start of the experiments. This clone will be analysed in a follow-up study.

      Finally, the effect of TBX19 on early pituitary fate markers is somewhat surprising given the phenotype of the knockout mice and patients with mutations. Thus, the use of a single clone for that study is also worrisome.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Strengths:

      The authors make mutations in TBX19 and NFKB2 that exist in affected patients. The TBX19 p.K146R mutation is recessive and causes isolated ACTH deficiency. Mutations in this gene account for 2/3 of isolated ACTH deficiency cases. The NFKB2 p.D865G mutation is heterozygous in a patient with recurrent infections and isolated ACTH deficiency. NFKB2 mutations are a rare cause of ACTH deficiency, and they can be associated with the loss of other pituitary hormones in some cases. However, all reported cases are heterozygous.

      The developmental studies of organoid differentiation seem rigorous in that 200 organoids were generated for each hiPSC line, and 3-10 organoids were analyzed for each time point and genotype. Differentiation analysis relied on both RNA transcript measurements and immunohistochemistry of cleared organoids using light sheet microscopy. Multiple time points were examined, including seven times for gene expression at the RNA level and two times in the later stages of differentiation for IHC.<br /> TBX19 deficient organoids exhibit reduced levels of PITX1, LHX3, and POMC (ACTH precursor) expression at the RNA and IHC level, and there are fewer corticotropes in the organoids, as ascertained by POMC IHC.

      The NFKB2 deficient organoids have a normal expression of the early pituitary transcription factor HESX1, but reduced expression of PITX2, LHX3, and POMC. Because there is no immune component in the organoid, this shows that NFKB2 mutations can affect corticotrope differentiation to produce POMC. RNA sequencing analysis of the organoids reveals potential downstream targets of NFKB2 action, including a potential effect on epithelial-to-mesenchymal-like transition and selected pituitary and hypothalamic transcription factors and signaling pathways.

      Weaknesses:

      There could be variation between individual iPSC lines that is unrelated to the genetically engineered change. While the authors check for off-target effects of the guide RNA at predicted sites using WGS, a better control would be to have independently engineered clones or to correct the engineered clone to wild type and show that the phenotypic effects are reversed.

      All NFKB2 patients are heterozygous for what appear to be dominant negative mutations that affect protein cleavage and nuclear localization of processed protein as homo or heterdimers. The organoids are homozygous for this mutation. Supplemental Figure 4 indicates that one heterozygous clone and two homozygous mutant clones were obtained. Analysis of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage.

      The main goal of this work was to evaluate if and how NFKB2D865G mutation affects hypothalamic-pituitary organoids development, in order to determine if these organoids would constitute a valuable model to study DAVID syndrome.

      We thank this reviewer for noting that we identified an important question and have used appropriate novel and not widely used methods to address it, including CRISPR/Cas9 genome editing of iPSCs and disease modelling in iPSC-derived HPOs that had not previously been reported by a team other than the one that initially described it, allowing to confirm our working hypothesis that DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. We also agree that analysing more clones, generated from same or different hiPSC lines, carrying homozygous or heterozygous mutations, and corrected mutations will be necessary in the future.

      Reviewer #3 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      This manuscript by Mac et al addresses the causes of pituitary dysfunction in patients with DAVID syndrome which is caused by mutations in the NFKB2 gene and leads to ACTH deficiency. The authors seek to determine whether the mutation directly leads to altered pituitary development, as opposed to an autoimmune defect, by using mutating human iPSCs and then establishing organoids that differentiate into pituitary tissue. They first seek to validate the system using a well-characterised mutation of the transcription factor TBX19, which also results in ACTH deficiency in patients. Then they characterise altered pituitary cell differentiation in mutant NFKB2 organoids and show that these lack corticotrophs, which would lead to ACTH deficiency.

      Strengths:

      The conclusion of the paper that ACTH deficiency in DAVID syndrome is independent of an autoimmune input is strong.

      Weaknesses:

      (1) The authors correctly emphasise the importance of establishing the validity of an iPSC-based model in being able to recapitulate in vivo dysfunctional pituitary development through characterisation of a TBX19 knock-in mutation. Whilst this leads to the expected failure of functional corticotroph differentiation, other aspects of the normal pituitary differentiation pathway upstream of corticotroph commitment seem to have been affected in surprising ways. In particular, the loss of LHX3 and PITX1 in TBX19 mutant organoids compared with wild type requires explanation, especially as the mutant protein would only be expected to be expressed in a small proportion of anterior pituitary lineage cells.

      If the developmental expression profile of key transcription factors in mutant organoids does not recapitulate that which occurs in vivo, any interpretation of the relevance of expression differences in the NFKB2 organoids to the mechanism(s) leading to corticotroph function in vivo has to be questionable.

      See response to Reviewer #2

      It is notable that the manipulation of iPSC cells used to generate mutants through CRISPR/Cas9 editing is not applied to the control iPSC line. It is possible that these manipulations lead to changes to the iPSC cells that are independent of the mutations introduced and this may change the phenotype of the cells. A better control would have been an iPSC line with a benign knock-in (such as GFP into the ROSA26 locus).

      We agree that the issue of off-target mutations should be addressed. However, we performed whole genome sequencing on TBX19 KI and did not observe any pathogenic variants other than the intended edition. We also checked that clones isolated during the screening procedure but that returned negative for editing still had the ability to generate pituitary cells. However, we made the choice to use the isogenic original hiPSC line as it could be compared to both TBX19 KI and NFKB2 KI simultaneously, therefore reducing workload and cost of the experiments. Any other knock-in mutation, such as GFP into the ROSA26 locus would imply the same risk of off-target mutations, but presumably at other sites in the genome.

      (2) In the results section of the manuscript the authors acknowledge that hypothalamic tissue in the NFKB2 mutant organoid may be having an effect on the development of pituitary tissue. However, in the discussion the emphasis is entirely on pituitary autonomous mechanisms such as pituitary HESX1 expression or POMC gene regulation; in the conclusion of the abstract, a direct role for NFKB2 in pituitary differentiation is described. Whilst the data here may suggest a non-immune mediated alteration in pituitary function in DAVID syndrome, if this is due to alteration of the developing hypothalamus then this is not direct. A fuller discussion of the potential hypothalamic contribution and/or further characterisation of this aspect is warranted.

      We agree with this reviewer that contributions of both hypothalamic and pituitary developing tissues should be taken into account. We performed more experiments and analysed the effect of both mutations on hypothalamic growth factors expression. These results are displayed in new figure 10. The role of the hypothalamus is now clearly mentioned and highlighted in the Discussion.

      (3) qRT-PCR data presented in Figure 6A shows negligible alteration of HESX1 expression at all time points in NFKB2 mutant organoids. This is not consistent with the 2-fold increase in HESX1 expression described in day 48 organoids found by bulk RNA sequencing.

      How do the authors reconcile these results and why is one result focused on in the discussion where a potential mechanism for a blockade of normal pituitary cell differentiation is suggested? Further confirmation of HESX1 expression is required.

      In the previous version on the manuscript, the HESX1 fold-change ratio between NFKB2 KI and WT at d48 was of 2.06 (p=0.22). However, the type of representation for expression kinetics (values relative to the expression peak in WT) and the scale used made it difficult to see. In the new version of the manuscript, we analysed more samples from the same experiments, and new figure (now 6B) shows significant increase of HESX1 expression (Fc = 2.46, p=0.019) in NFKB2 KI.

      Also, qPCR results come from at least two different experiments whereas RNAseq come from a single one. For RT-qPCR, 6 HPOs per genotype were picked and further analysed. As we found that only 60-70% of organoids show signs of pituitary cell differentiation, we chose to perform a preselection of organoids, based on RT-qPCR expression of selected markers (SOX2, HESX1, PITX1, LHX3, TBX19, POU1F1 and POMC) in order to avoid having “empty” HPOs sent for bulk RNAseq. We compared HESX1 expression ratios obtained by the two different techniques on the same samples (the ones used for RNA-seq) and found values of 2.19 (p=0.03) and 1.83 (p=0.061) for RNA-seq and RT-qPCR respectively. This is illustrated in Supplementary Figure 7. Our new results thus clearly demonstrate the increase in HESX1 expression in NFKB2 KI from d27 to d75.

      (4) Throughout the authors focus on POMC gene expression and ACTH antibody immunopositive as being indicative of corticotroph cell identity. In the human fetal pituitary melanotrophs are present and most ACTH antibodies are unable to distinguish these cells from corticotrophs. Is the antibody used specifically for ACTH rather than other products of the POMC gene? It is unlikely that all the ACTH-positive cells are melanotrophs, nevertheless, it is important to know what the proportions of the 2 POMC-positive cell types are. This could be distinguished by looking for the expression of NeuroD1, which would also define whether corticotrophs are committed but not fully differentiated in the NFKB2 mutant organoids. In support of an effect on corticotrophs, it is notable that CRHR1 expression (which would be expected to be restricted to this cell type) is reduced by 84% in bulk RNAseq data (Table 1) and this may be an indicator of the loss of corticotrophs in the model.

      The antibody we used is directed against ACTH. In HPOs, PAX7 expression was barely detected during the whole experiment. Moreover, although PCSK2 transcripts were observed, their expression started very early (d27) and remained constant, suggesting that an expression of this gene in hypothalamic cells rather than pituitary cells. All these observations suggest that melanotrophs are very unlikely to be present in HPOs.

      (5) Notwithstanding the caveats about whether the organoid model recapitulates in vivo pituitary differentiation (see 1 above) and whether the bulk RNAseq accurately reflects expression levels (see 3 above), there are potentially some extremely interesting changes in gene expression shown in Table 1 which warrant further discussion. For example, there is a 25-fold reduction in POU1F1 expression which may be expected to reflect a loss of somatotrophs in the organoid (and possibly lactotrophs) and highlights the importance of characterising the effect of NFKB2 on other anterior pituitary cell types within the organoid. If somatotrophs are affected, this may be relevant to the organoids as a model of DAVID syndrome as GH deficiency has been described in some individuals with NFKB2 mutations. The huge increase in CGA expression may reflect a switch in cell fate to gonadotrophs, as has been described with a loss of TPIT in the mouse. These are examples of the changes that warrant further characterisation and discussion.

      We performed a more in-depth analysis of other pituitary lineages (mainly somatotrophs). We confirmed the strong reduction in PROP1 and POU1F1 expression in NFKB2 KI organoids. Although the strong increase in CGA expression in the mutant may raise the possibility of a redirection towards gonadotroph lineage, the lack of change in NR5A1 expression may suggest otherwise.

      These results are now illustrated in figure 12 and discussed in a full paragraph.

      (6) How do the authors explain the lack of effect of NFKB2 mutation on global NFKB signalling?

      The most likely explanation is that p100/p52 is not involved in controlling the expression of other members of NFKB signalling. Therefore, the absence of global alteration of NFKB signaling pathway shows that mutant p100/p52 protein is directly responsible for the observed phenotype.

      Recommendations for the authors:

      Reviewing editor summary of recommendation to authors:

      The use of hypothalamic-pituitary organoids can provide a fundamental understanding of pituitary gland development and differentiation. Their use to study human pituitary insufficiency is important, gaining insight into the aetiology of disease and if it implicates the hypothalamus or anterior pituitary. To this end, there is only one other example of their use in the literature, where Matsumoto et al, (2019), used OTX2-mutant hypothalamic-pituitary organoids to understand the aetiology of pituitary hypoplasia driven by OTX2 mutations. This being the second example of using gene editing in human iPSC-derived hypothalamic-pituitary organoids, these studies have improved the efficiency of differentiation previously published by Suga et al. (2011) for ES cells, and Matsumoto et al. (2019) for iPS cells. In addition, it has solidified that this method is useful, especially when studying hypothalamic involvement in human pituitary anomalies, due to the concerted development of these two structures.

      The reviewers recognise the valuable insight provided into the mechanism of NFKB2 action during pituitary development and how this human organoid model might be one of the few or only approaches suitable to discover the aetiology of the pituitary phenotype.

      The reviewers agree that both the evidence provided from the organoid model, as well as the characterisation of the phenotype are incomplete. In particular, the strength of evidence would be improved by analysing additional independent clones for both NFKB2 as well as TBX19 gene-edited iPSCs. Additionally, analysis of NFKB2 expression both in vivo and in the organoids, as well as analysis for the NFKB2 targets put forward, would be a lot more informative to help understand this phenotype.

      The main recommendations discussed are summarised here and the reviewers have elaborated on these points in their individual reviews:

      The two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones, unrelated to the mutation, makes the conclusions less compelling. Two independent homozygous clones were obtained for NFKB2 but only one was used, so analysis of the second clone would strengthen the findings. A heterozygous clone was also obtained and given all NFKB2 patients are heterozygous for what appears to be dominant negative mutations, the heterozygous clone ought to be analysed. Analyses of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage. The reviewers provide excellent suggestions for alternative controls for the engineered iPSC lines in their specific comments.

      The effect of TBX19 mutation on early pituitary fate markers LHX3 and PITX1 is surprising given the phenotype of the knockout mice and patients with mutations. If the developmental profile of essential transcription factors does not recapitulate the in vivo expression in this well-characterised mutant, this brings the organoid model into question. Thus, analysis of a further clone for the study of mutant TBX19 would be crucial. The validity of this control affects the interpretations relying on expression differences in the NFKB2-mutant organoids.

      The study has implicated NFKB2 in pituitary development, but more insight is needed to fully understand disease pathogenesis. The authors presented potential downstream targets of NFKB2 action, including transcription factors and key signalling pathway components; further analyses of NFKB2 expression and experiments investigating the relevant factors in more detail will help elucidate this point.

      Discerning between the hypothalamus and pituitary tissue is fundamental to interpreting phenotypes: (i) To pinpoint the primary tissue affected by NFKB2 deficiency, staining for NFKB2 during development in vivo will determine if this is expressed both in the developing hypothalamus and anterior pituitary gland or only one of these tissues. (ii) Using markers of hypothalamus and pituitary to discern between these two tissues in organoids, will provide a lot of valuable information where expression changes are presented. This would help discern the contribution of the developing hypothalamus as this is still unclear and has not been discussed. Knowing which tissue compartments NFKB2 is expressed in the organoids would also be of great value.

      The organoids provide an opportunity to characterise the effects of NFKB2 on other pituitary cell types, since the bulk RNAseq presents intriguing changes indicating that not only corticotrophs may be affected. This may be of relevance to patients, which can have additional pituitary hormone deficiencies. If NFKB2 is expressed in the pituitary, demonstrating expression in the different cell types in vivo as well as in the organoids would help interpret the phenotype. Is this expressed only in corticotrophs/corticotroph precursors, or in additional endocrine cells?

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form hiPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      We have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. Specifically, we now show the effect of NFKB2 mutation on hypothalamic growth factors and pituitary progenitor differentiation (figure 10), different stages of corticotroph maturation (figure 11) and effects on PROP1/POU1F1-dependent lineages (figure 12). We confronted our results to publicly available ChIPseq data concerning p52 transcriptional targets (figure 13). We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #1 (Recommendations For The Authors):

      In organoids, it is essential to stain for NFKB: is it the hypothalamus or the pituitary that expresses NFKB, and if the pituitary, is it the corticotroph itself or the surrounding cells? If immunostaining is not available, FISH or RNAscope can be used to look at expression.

      Figure 7 shows stronger expression of p100/p52 in pituitary progenitors, and some expression in the hypothalamic part of the organoid. Due to current lack of biological material and length of experimental procedure, we could not yet determine which differentiated cell types express p100/p52, but this is clearly something we will look at in further experiments.

      Regarding Figure 7, NFKB2 (D865G/D865G) shows no LHX3 expression already at day 48. It would be better to look at expression including PITX1 at an earlier time point to see at what point differentiation is impaired.

      RT-qPCR results show no statistically significant changes in PITX1 (Fc=0.58, p=0.25) or LHX3 (Fc = 0.15; p=0.22) expression at d27, although there was a tendency towards downregulation.

      Is it really just a species difference that NFKB2-deficient mice do not have abnormal pituitary function? This needs to be discussed in the manuscript.

      Nfkb2_Lym1/Lym1 mice and _NFKB2 KI model have different but functionally very similar mutations, as they both lead to an abnormal processing of p100 and a strong reduction of p52 content. In mice, these mutations are more severe than the complete absence of Nfkb2 gene product, and they have been called “super repressors”. It is therefore surprising that no pituitary phenotype as been observed in mice. In our opinion, this constitutes a strong argument in favour of an inter-species difference, at least for the pathogenicity of this type of mutations.

      This point is now addressed in the Discussion

      Just looking at changes in gene expression by qPCR and bulk RNA-seq does not give enough information about localisation. We wish RNA-seq had at least been separated by FACS first. For example, FACS can separate the anterior pituitary and hypothalamus by EpCAM positivity/negativity (PMID: 35903276), so we would like to see gene expression in such separated samples.

      This is a pertinent suggestion. We are aware of these techniques and we hope we will be able to include them in future studies

      For Figures 2 and 6, just looking at changes in gene expression by qPCR does not provide localisation information, so either (1) immunostaining for LHX3 and NKX2.1 should be shown in each aggregate as in FigS3, or (2) qPCR should be performed on the FACSed cells. (2) qPCR on FACSed cells.

      PITX1, LHX3 (as confirmed by our immunofluorescence data) and HESX1 are only expressed in non-neural tissue. TBX19 could be expressed in the hypothalamic part of the organoid, but we observed very little immunostaining outside the outermost layers of organoids (i.e. pituitary tissue). The antibody we used to detect corticotrophs only recognizes ACTH, and therefore only marks pituitary cells.

      In addition, pathway and gene ontology analyses should be performed.

      Pathways and gene ontology have been performed. However, as organoids consist of two different tissues, the analysis of over 4800 differentially expressed genes did not give us very informative results, apart from an impairment of retinoic acid signalling that we are currently investigating

      Reviewer #2 (Recommendations For The Authors):

      The differentiation of iPSC to organoids could be variable. The authors indicate that 200 organoids were analyzed for each line, and 3-10 organoids were analyzed per time point, genotype, and assay. Is it clear that 100% of the organoids differentiate to produce corticotropes? Please clarify.

      In our experiments, almost 90% of organoids give rise to non-neural ectoderm, as demonstrated by PITX1 expression. However, depending on experiments, only 60-70% of organoids give rise to pituitary progenitors (LHX3+) and subsequently to corticotropes. This has been clarified in the text.

      For TBX19, it seems surprising that there is an effect on PITX1 and LHX3 expression, since TBX19 expression is normally activated after these genes are expressed. An effect of TBX19 on EMT would also be surprising as the knockout mice do not have dysmorphology of the stem cell niche. The only evidence for an effect is the reduced IHC for E-cadherin. If this is an important point, the authors should examine other EMT markers such as Zeb2. The TBX19 knockout mice appear to form corticotropes based on the expression of NeuroD1, even though they lack TBX19 and POMC expression. It would be reassuring to see that NeuroD1 is normally expressed in the TBX19 mutant organoids.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Apart from the lack of change in ZEB2 expression in TBX19 KI (Fc = 1.15; p = 0.35), we did not look further for changes in EMT markers in TBX19 KI. However, we added a more detailed analysis for EMT markers expression in NFKB2 KI based on RNAseq results (see table 2).

      Due to lack of material, we could not confirm NEUROD1 expression by immunostaining. However, RT-qPCR showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64)

      NFKB2 IHC was markedly reduced in NFKB2 D865G/D865G organoids. Based on previous experiments, the mutant protein should be expressed but not activated by proteolytic cleavage. It is possible that the antibody has a different affinity for the mutant protein and/or the uncleaved protein may be unstable. Can this be clarified? The mRNA for mutant NFKB2 appears unchanged in Table 1.

      This is puzzling indeed. We did not notice any change in NFKB2 from d27 to d105, and no significant change either between WT and NFKB2 KI. Although the antibody we used recognizes both p100 and p52, we cannot rule out the possibility that p100/p52 is degraded by pathways other than proteasome. Another possibility is that p100 interactions with other proteins may decrease the accessibility of the antibody to the epitope

      The RNA sequencing data from the NFKB2 organoids is intriguing. It suggests that the NFKB2 mutation may have a modest effect on Tbx19 transcription but not Neurod1. It also suggests there are hypothalamic effects, i.e. altered expression of hypothalamic markers in mutant organoids. Is NFKB2 expressed in the developing hypothalamus? Can normal NEUROD1 IHC be confirmed? It is also intriguing that there may be an effect on EMT. However, there seem to be some discrepancies in the direction of effect on these markers. Please clarify.

      This is related to the point just above. P100/p52 is described as a ubiquitously expressed protein. We think that it is expressed in the hypothalamic part of the organoids, but at a lower level compared to pituitary progenitors.

      As mentioned before, we could not yet confirm NEUROD1 expression by immunostaining, but RT-qPCR clearly showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64) or NFKB2 KI (Fc = 0.88; p = 0.5). However, we investigated other markers of different stages of corticotroph differentiation (see figure 11) and found that the later stages are most affected.

      Concerning the EMT, we also found changes in the expression of other markers that are shown in Table 2 and discussed further in the text.

      Cytokines have been proposed to play important roles in pituitary differentiation, i.e. IL6. Is there any evidence for an altered cytokine or chemokine expression in the NFKB2 organoids?

      We didn’t see any change in IL6 expression NFKB2 KI (Fc = 2.34; p = 0.55), but RNAseq shows a strong increase in IL6R (Fc = 8.89; p = 2.13e-09). But at this point, the relevance of these observations remains elusive.

      Minor:

      Some patients with DAVID syndrome have pituitary hypoplasia. The authors measure organoid size and find no differences based on genotype. However, each organoid probably has a variable amount of tissue differentiated to pituitary and hypothalamic fates, therefore, the volume of the whole organoid may not be a good proxy for the amount of pituitary tissue.

      We are aware of this issue. However, for most pituitary genes measured by RT-qPCR (PITX1, LHX3, TBX19), the deltaCt values did not drastically vary for a given time point/genotype, suggesting a stable pituitary/hypothalamic ratio.

      Figure 9 shows whole transcriptome data for the NFKB2 organoids, and Table 1 lists the data for selected genes. There appears to be disagreement between the significance cut-offs used in the figure and the table. Please adjust.

      We removed the fold-change cut-offs to improve clarity

      elife120868_0_supp_2945725_rxl2z4. "haft" appears several times, but it should be "half".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript provides strong evidence that the molecular function of SLC35G1, an orphan human SLC transporter, is citrate export at the basolateral membrane of intestinal epithelial cells. Multiple lines of evidence, including radioactive transport experiments, immunohistochemical staining, gene expression analysis, and siRNA knockdown are combined to deduce a model of the physiological role of this transporter.

      Strengths:

      The experimental approaches are comprehensive, and together establish a strong model for the role of SLC35G1 in citrate uptake. The observation that chloride inhibits uptake suggests an interesting mechanism that exploits the difference in chloride concentration across the basolateral membrane.

      Weaknesses:

      Some aspects of the results would benefit from a more thorough discussion of the conclusions and/or model.

      For example, the authors find that SLC35G1 prefers the dianionic (singly protonated) form of citrate, and rationalize this finding by comparison with the substrate selectivity of the citrate importer NaDC1. However, this comparison has weaknesses when considering the physiological pH for SLC35G1 and NaDC1. NaDC1 binds citrate at a pH of ~5.4 (the pKa of citrate is 5.4, so there is a lot of dianionic citrate present under physiological circumstances). SLC35G1 binds citrate under pH conditions of ~7.5, where a very small amount of dianionic citrate is present. The data clearly show a pH dependence of transport, and the authors rule out proton coupling, but the discrepancy between the pH dependence and the physiological expectations should be addressed/commented on.

      Thank you for your insightful comment. Citrate exists mostly in its trianionic form under near neutral pH conditions in biological fluids, as you pointed out. Its dianionic form represents only a small portion (about 1/100) of total citrate due to the pKa. However, significant SLC35G1-specific uptake was observed under near neutral pH conditions (Figure 1G). Therefore, although SLC35G1-mediated citrate transport is less efficient under physiologically relevant near neutral pH conditions, it could still play a role particularly in the intestinal absorption process, in which the concentration gradient of dianionic citrate could be maintained by continuous supply by NaDC1-mediated apical uptake.

      The rationale for the series of compounds tested in Figure 1F, which includes metabolites with carboxylate groups, a selection of drugs including anion channel inhibitors and statins, and bile acids, is not described. Moreover, the lessons drawn from this experiment are vague and should be expanded upon. It is not clear what, if anything, the compounds that reduce citrate uptake have in common.

      Thank you for highlighting the need for clarity regarding the compounds tested in Figure 1F. The tested compounds were TCA cycle intermediates (fumarate, α-ketoglutarate, malate, pyruvate, and succinate) as substrate candidate carboxylates analogous to citrate, diverse anionic compounds (BSP, DIDS, probenecid, pravastatin, and taurocholate) as those that might be substrates or inhibitors, and diverse cationic compounds (cimetidine, quinidine, and verapamil) as those that are least likely to interact with SLC35G1. Among them, certain anionic compounds significantly reduced SLC35G1-specific citrate uptake, suggesting that they may interact with SLC35G1. However, we could not identify any structural features commonly shared by these compounds, except that they have anionic moieties. We acknowledge that it requires further elaboration to clarify such structural features. We have revised the relevant section on p. 3 (line 25 - 32) to include these.

      The transporter is described as a facilitative transporter, but this is not established definitively. For example, another possibility could involve coupling citrate transport to another substrate, possibly even chloride ion.

      Thank you for your insightful comment regarding the nature of SLC35G1's transport mechanism. While we have described SLC35G1 as a facilitative transporter based on our current data, we acknowledge that this has not been definitively proven, as you pointed out, and we cannot exclude the possibility that its sensitivity to extracellular Cl- might imply its operation as a citrate/Cl- exchanger. To examine the possibility, we would need to manipulate the chloride ion gradient across the plasma membrane. Particularly, generating an outward Cl- gradient to see if it could enhance citrate uptake could be a potential strategy. However, current techniques do not allow us to effectively generate the Cl- gradient, thus preventing us from conclusively verifying this possibility. We recognize the importance of further investigating this aspect in future studies. Your suggestion highlights an important area for additional research to fully understand the transport mechanism of SLC35G1. We have additionally commented on this issue on p. 4 (line 1 – 3).

      Reviewer #2 (Public Review):

      Summary:

      The primary goal of this study was to identify the transport pathway that is responsible for the release of dietary citrate from enterocytes into blood across the basolateral membrane.

      Strengths:

      The transport pathway responsible for the entry of dietary citrate into enterocytes was already known, but the transporter responsible for the second step remained unidentified. The studies presented in this manuscript identify SLC35G1 as the most likely transporter that mediates the release of absorbed citrate from intestinal cells into the serosal side. This fills an important gap in our current knowledge of the transcellular absorption of dietary citrate. The exclusive localization of the transporter in the basolateral membrane of human intestinal cells and the human intestinal cell line Caco-2 and the inhibition of the transporter function by chloride support this conclusion.

      Weaknesses:

      (i) The substrate specificity experiments have been done with relatively low concentrations of potential competing substrates, considering the relatively low affinity of the transporter for citrate. Given that NaDC1 brings in not only citrate as a divalent anion but also other divalent anions such as succinate, it is possible that SLC35G1 is responsible for the release of not only citrate but also other dicarboxylates. But the substrate specificity studies show that the dicarboxylates tested did not compete with citrate, meaning that SLc35G1 is selective for the citrate (2-), but this conclusion might be flawed because of the low concentration of the competing substrates used in the experiment.

      Thank you for your valuable comment on our substrate specificity experiments. As you pointed out, we cannot rule out the possibility that dicarboxylates might be recognized by SLC35G1 with low affinity as the tested concentration was relatively low. However, at the concentration of 200 μM, competing substrates with an affinity comparable to that of citrate could inhibit SLC35G1-specific citrate uptake by about 30%. Therefore, it is likely that the compounds that did not exhibit significant effect have no affinity or at least lower affinity than citrate to SLC35G1. Further studies should explore a broader range of concentrations for potential substrates including those with lower affinity. It would help clarify the substrate recognition characteristics of SLC35G1 and if it indeed has a unique preference for citrate over dicarboxylates. We have additionally mentioned that on p. 3, line 32 – 35.

      (ii) The authors have used MDCK cells for assessment of the transcellular transfer of citrate via SLC35G1, but it is not clear whether this cell line expresses NaDC1 in the apical membrane as the enterocytes do. Even though the authors expressed SLC35G1 ectopically in MDCK cells and showed that the transporter localizes to the basolateral membrane, the question as to how citrate actually enters the apical membrane for SLC35G1 in the other membrane to work remains unanswered.

      Thank you for highlighting this important aspect of our study. The mechanism of apical citrate entry in MDCKII cells is unknown, although NaDC1 or a similar transporter may be involved. However, this set of experiments have successfully demonstrated the basolateral localization of SLC35G1 and its operation for citrate efflux. Attempts to clarify the apical entry mechanism may need to be included in future studies for more detailed characterization of the model system using MDCKII cells. This would help in fully understanding the transcellular transport system for citrate. Investigation using Caco-2 cells or MDCKII cells double transfected with NaDC1 and SLC35G1 would also need to be induced in future studies to gain more definitive insights into the transcellular transport mechanism for citrate in the intestine, delineating the suggested cooperative role of NaDC1 and SLC35G1. We would be grateful for your understanding of our handling regarding this issue.

      (iii) There is one other transporter that has already been identified for the efflux of citrate in some cell types in the literature (SLC62A1, PLoS Genetics; 10.1371/journal.pgen.1008884), but no mention of this transporter has been made in the current manuscript.

      Thank you for bringing up the relevance of SLC62A1, which has recently been identified as a citrate efflux transporter in some cell types (PLoS Genet, 16, e1008884, 2020). We have now included comments on this transporter in Introduction (p. 2).

      Reviewer #3 (Public Review):

      Summary:

      Mimura et al describe the discovery of the orphan transporter SLC35G1 as a citrate transporter in the small intestine. Using a combination of cellular transport assays, they show that SLC35G1 can mediate citrate transport in small intestinal cell lines. Furthermore, they investigate its expression and localization in both human tissue and cell lines. Limited evidence exists to date on both SLC35G1 and citrate uptake in the small intestine, therefore this study is an important contribution to both fields. However, the main claims by the authors are only partially supported by experimental evidence.

      Strengths:

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      Further confirmation and clarification are required to claim that the SLC indeed exports citrate at the basolateral membrane as concluded by the authors. Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120 mM at the basolateral side). The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect. Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Thank you for highlighting these important points. We used the Cl--rich medium in transcellular transport studies, as stated in the relevant section in Meterials and Methods (p. 6, line 2 – 5). The Cl- concentration (144 mM) was comparable to the physiological concentration in extracellular body fluids. To clarify that experimental condition, we have additionally noted that in the text (p. 4, line 9) and the legends of Figs. 1K and 1L. The results indicate that basolaterally localized SLC35G1 can mediate citrate export effectively under the Cl--rich extracellular condition. The transport mechanism regulated by Cl- is unclear, but it is difficult to further clarify the mechanism at this time. We recognize the importance of further investigating the aspect in future studies, including the possibility that SLC35G1 might be a citrate/Cl- exchanger, as pointed out by Reviewer #1 (3rd comment).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The figures are very tiny and difficult to see. The inset in Figure 1C is much too small to be readable. I suggest enlarging the panels.

      Thank you for your feedback. As advised, we have enlarged the panels to improve visibility.

      Line 74: "certain anionic compounds signficantly inhibited SLC35G1-specific citrate uptake, indicating they are also recognized by SLC35G1." This sentence should be reworded since the mechanism is not clear. The word "reduced" would be a better option than "inhibited." Are there other interpretations besides SLC35G1 binding to explain the observations?

      Thank you for your suggestion. We have reworded the sentence to improve clarity (p. 3, line 30). It may be possible to speculate that they interact with SLC35G1, but the mechanisms are not clear yet.

      The manuscript is vague about how the transporter was discovered. If a screen of orphan transporters was performed to identify a citrate transporter, this should be described.

      Thank you for pointing out the need for more details regarding the discovery of the transporter. We have added some detailed description at the beginning of Results and Discussion (p. 3).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors:

      (1) For transcellular transport of citrate and the role of SLC35G1, it would be better to use Caco-2 cells cultured on Transwells because these cells express NaDC1 in the apical membrane and the authors have shown that SLC35G1 is expressed in the basolateral membrane in this cell line. The mechanism for the entry of citrate into MDCK cells used in the present manuscript is not known. If the authors prefer to use MDCK cells because of their superior use for polarization, they can use a double transfection (NaDC1 and SLC35G1) to differentially express the two transporters in the apical versus and basolateral membrane and then use the cells for trans cellular transport of citrate.

      Please refer to our reply to your second review comment.

      (2) The substrate specificity experiments should use concentrations higher than 0.2 mM for competing dicarboxylates because the Km for citrate is only 0.5 mM. It is likely that NaDC1 brings in citrate and other dicarboxylates into enterocytes and then SLC35G1 mediates the efflux of these metabolic intermediates into blood.

      Please refer to our reply to your first review comment.

      (3) One major aspect of the transport function of this newly discovered citrate efflux transporter that has not been explored is the role of membrane potential in the transport function. The transporter is not coupled to Na or K or even H; so then the transport of citrate via this transporter must be electrogenic. Of course, this would be perfect for the transporter to function in the efflux of citrate because of the inside-negative membrane potential, but the authors need to show that the transporter is electrogenic. This can be examined through Caco-2 cells and/or MDCK cells expressing SLC35G1 and examining the impact of changes in membrane potential (valinomycin and K) on the transport of citrate.

      Thank you for your suggestion. As shown in Figure 1D, the use of K-gluconate in place of Na-gluconate, which induces plasma membrane depolarization, had no impact on the specific uptake of citrate, suggesting that SLC35G1-mediated citrate transport is independent of membrane potential. We have additionally mentioned this on p. 3 (line 21 – 24).

      (4) The localization studies mention Na/K ATPase component as a basolateral membrane marker, but the text describes it as BCRP. This needs to be corrected.

      Thank you for pointing out the mistake. We have corrected that. The marker was ATP1A1.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120mM at basolateral side). Why was this chloride concentration not mimicked accordingly in the Transwell assay?

      (2) The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect.

      (3) Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Please refer to our reply to your review comments.

      Related to the localization of SLC35G1:

      (4) The polyclonal antibody against SLC35G1 should be validated to prove the specificity. This should be relatively straightforward given the authors have SLC35G1 knockdown cells.

      Thank you for your suggestion. To validate the specificity of the polyclonal antibody against SLC35G1, we prepared HEK293 cells transiently expressing SLC35G1 and SLC35G1 tagged with a FLAG epitope at the C-terminus (SLC35G1-FLAG). In the immunostained images, whereas only SLC35G1-FLAG was stained with the anti-FLAG antibody, both SLC35G1 and SLC35G1-FLAG were stained with the anti-SLC35G1 antibody, indicating that the anti-SLC35G1 antibody can recognize SLC35G1. In addition, the localization patterns of SLC35G1-FLAG observed with both antibodies were consistent, indicating furthermore that the anti-SLC35G1 antibody can recognize SLC35G1 specifically. Based on all these, the specificity of the anti-SLC35G1 antibody was validated.

      Author response image 1.

      (5) To strengthen the data on the localization of SLC35G1, the cell lines should be co-stained with a plasma membrane marker as well, not just in tissue with ATP1A1. In polarized cells co-staining with apical and basolateral markers should be applied.

      SLC35G1 was indicated to be localized to the basolateral membrane geometrically in both polarized MDCKII and Caco-2 cells. This finding aligns with its basolateral localization indicated by its colocalization with ATP1A1 in the human small intestinal section. These results are we consider sufficient to support the basolateral localization characteristics of SLC35G1.

      General points:

      (6) In the abstract the authors mention that they focus on highly expressed orphan transporters in the small intestine as candidates. However, no other candidates are mentioned or discussed in the study. Consequently, this should be rephrased.

      Thank you for the advice. Also taking into consideration the third recommendation point by Reviewer #1, we have added some detailed description at the beginning of Results and Discussion (p. 3).

      (7) As far as mentioned there is exactly one (other) publication on SLC35G1 (10.1073/pnas.1117231108). The authors should discuss this only publication with functional data on SLC35G1 in more detail. How do the authors integrate their findings with the existing knowledge? For example, why did the authors not investigate the impact of Ca2+ on SLC35G1 transport?

      Thank you for your suggestion. SLC35G1 was indicated to be mainly localized to the endoplasmic reticulum (ER) in the earlier study, in which SLC35G1 was tagged with GFP. A possibility is that SLC35G1 was wrongly directed to ER due to the modulation in the study. We have additionally mentioned this possibility in the relevant section (p. 3, line 9 – 11). We have also revised a relevant sentence on p. 3 (line 5).

      With regard to another point that GFP-tagged SLC35G1 was indicated to interact with STIM1, we examined its effect on SLC35G1-mediated citrate uptake supplementary. As shown in the accompanying figure, coexpression of HA-tagged STIM1 did not affect the elevated citrate uptake induced by FLAG-tagged SLC35G1, indicating that STIM1 has no impact on citrate transport function of SLC35G1 at the plasma membrane.

      Author response image 2.

      (A) Effect of the coexpression of HA-tagged STIM1 on [14C]citrate (1 μM) uptake by FLAG-tagged SLC35G1 transiently expressed in HEK293 cells. The uptake was evaluated for 10 min at pH 5.5 and 37°C. Data represent the mean ± SD of three biological replicates. Statistical differences were assessed using ANOVA followed by Dunnett’s test. *, p < 0.05 compared with the control (gray bar). (B) Western blot analysis was conducted by probing for the HA and FLAG tags, using the whole-cell lysate samples (10 µg protein aliquots) prepared from cells expressing HA-STIM1 and/or FLAG-SLC35G1. The blots of β-actin are shown for reference.

      (8) Generally, the introduction could provide more background.

      In response to your suggestion and also to the third review comment from Reviewer #2, we have now additionally included comments on SLC62A1, which has recently been reported as a citrate efflux transporter in some cell types, in Introduction.

      Minor points:

      (9) There is a typo in Figure 1D: manniotol instead of mannitol.

      Thank you for pointing that out. We have corrected the typo in Figure 1D.

      (10) Figure 1J: The resolution is low and the localization to the basolateral membrane is not conclusive based on this image. It seems rather localized at the whole membrane and intracellularly too.

      Thank you for your feedback. We have enhanced the resolution of the image and also enlarged it to improve clarity and make the basolateral membrane localization more discernible.

      (11) Figure 1K: Clarification is needed if the experiment was performed in the Transwell plate. Based on the results from the pH titration experiment, it is expected that there is no uptake at pH7.4. Therefore, this experiment does not seem to provide additional evidence or support the conclusions drawn related to cellular polarization.

      Please refer to our reply to your review comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Galanti et al. present an innovative new method to determine the susceptibility of large collections of plant accessions towards infestations by herbivores and pathogens. This work resulted from an unplanned infestation of plants in a greenhouse that was later harvested for sequencing. When these plants were extracted for DNA, associated pest DNA was extracted and sequenced as well. In a standard analysis, all sequencing reads would be mapped to the plant reference genome and unmapped reads, most likely originating from 'exogenous' pest DNA, would be discarded. Here, the authors argue that these unmapped reads contain valuable information and can be used to quantify plant infestation loads.

      For the present manuscript, the authors re-analysed a published dataset of 207 sequenced accessions of Thlaspi arvense. In this data, 0.5% of all reads had been classified as exogenous reads, while 99.5% mapped to the T. arvense reference genome. In a first step, however, the authors repeated read mapping against other reference genomes of potential pest species and found that a substantial fraction of 'ambiguous' reads mapped to at least one such species. Removing these reads improved the results of downstream GWAs, and is in itself an interesting tool that should be adopted more widely.

      The exogenous reads were primarily mapped to the genomes of the aphid Myzus persicae and the powdery mildew Erysiphe cruciferarum, from which the authors concluded that these were the likely pests present in their greenhouse. The authors then used these mapped pest read counts as an approximate measure of infestation load and performed GWA studies to identify plant gene regions across the T. arvense accessions that were associated with higher or lower pest read counts. In principle, this is an exciting approach that extracts useful information from 'junk' reads that are usually discarded. The results seem to support the authors' arguments, with relatively high heritabilities of pest read counts among T. arvense accessions, and GWA peaks close to known defence genes. Nonetheless, I do feel that more validation would be needed to support these conclusions, and given the radical novelty of this approach, additional experiments should be performed.

      A weakness of this study is that no actual aphid or mildew infestations of plants were recorded by the authors. They only mention that they anecdotally observed differences in infestations among accessions. As systematic quantification is no longer possible in retrospect, a smaller experiment could be performed in which a few accessions are infested with different quantities of aphids and/or mildew, followed by sequencing and pest read mapping. Such an approach would have the added benefit of allowing causally linking pest read count and pest load, thereby going beyond correlational associations.

      On a technical note, it seems feasible that mildew-infested leaves would have been selected for extraction, but it is harder to explain how aphid DNA would have been extracted alongside plant DNA. Presumably, all leaves would have been cleaned of live aphids before they were placed in extraction tubes. What then is the origin of aphid DNA in these samples? Are these trace amounts from aphid saliva and faeces/honeydew that were left on the leaves? If this is the case, I would expect there to be substantially more mildew DNA than aphid DNA, yet the absolute read counts for aphids are actually higher. Presumably read counts should only be used as a relative metric within a pest organism, but this unexpected result nonetheless raises questions about what these read counts reflect. Again, having experimental data from different aphid densities would make these results more convincing.

      We agree with the reviewer that additional aphid counts at the time of (or prior to) sequencing would have been ideal, but unfortunately we do not have these data. However, compared to such counts one strength of our sequencing-based approach is that it (presumably) integrates over longer periods than a single observation (e.g. if aphid abundances fluctuated, or winged aphids visited leaves only temporarily), and that it can detect pathogens even when invisible to our eyes, e.g. before a mildew colony becomes visible. Moreover, the key point of our study is that we can detect variation in pest abundance even in the absence of count data, which are really time consuming to collect.

      Conducting a new experiment, with controlled aphid infestations and continuous monitoring of their abundances, to test for correlation between pest abundance and the number of detected reads would require resequencing at least 30-50% of the collection for the results to be reliable. It would be a major experimental study in itself.

      Regarding the origin of aphid reads and the differences in read-counts between e.g. aphids and mildew, we believe this should not be of concern. DNA contamination is very common in all kinds of samples, but these reads are simply discarded in other studies. For example, although we collected and handled samples using gloves, MG-RAST detected human reads (Hominidae, S2 Table), possibly from handling the plants during transplanting or phenotyping 1-2 weeks before sequencing. Therefore, although we did remove aphids from the leaves at collection, aphid saliva or temporary presence on leaves must have been enough to leave detectable DNA traces. Additionally, the fact that the M. persicae load strongly correlates with the Buchnera aphidicola load (R2\=0.86, S6 Table), is reassuring. This obligate aphid symbiont is expected to be found in high amounts when sequencing aphids (see e.g. The International Aphid Genomics Consortium (2010))

      The higher amount of aphid compared to mildew reads, can probably be explained by aphids having expanded more than mildew at the time of plant collection, but most importantly, as already mentioned by the reviewer, the read-counts were meant to compare plant accessions rather then pests to one another. We are interested in relative not absolute values. Comparisons between pest species are a challenge because they can be influenced by several factors such as the availability of sequences in the MG-RAST database and the DNA extraction kit used, which is plant-specific and might bias towards certain groups. All these potential biases are not a concern when comparing different plants as they are equally subject to these biases.

      Reviewer #2 (Public Review):

      Summary:

      Galanti et al investigate genetic variation in plant pest resistance using non-target reads from whole-genome sequencing of 207 field lines spontaneously colonized by aphids and mildew. They calculate significant differences in pest DNA load between populations and lines, with heritability and correlation with climate and glucosinolate content. By genome-wide association analyses they identify known defence genes and novel regions potentially associated with pest load variation. Additionally, they suggest that differential methylation at transposons and some genes are involved in responses to pathogen pressure. The authors present in this study the potential of leveraging non-target sequencing reads to estimate plant biotic interactions, in general for GWAS, and provide insights into the defence mechanisms of Thlaspi arvense.

      Strengths:

      The authors ask an interesting and important question. Overall, I found the manuscript very well-written, with a very concrete and clear question, a well-structured experimental design, and clear differences from previous work. Their important results could potentially have implications and utility for many systems in phenotype-genotype prediction. In particular, I think the use of unmapped reads for GWAS is intriguing.

      Thank you for appreciating the originality and potential of our work.

      Weaknesses:

      I found that several of the conclusions are incomplete, not well supposed by the data and/or some methods/results require additional details to be able to be judged. I believe these analyses and/or additional clarifications should be considered.

      Thank you very much for the supportive and constructive comments. They helped us to improve the manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The authors address an interesting and significant question, with a well-written manuscript that outlines a clear experimental design and distinguishes itself from previous work. However, some conclusions seem incomplete, lacking sufficient support from the data, or requiring additional methodological details for proper evaluation. Addressing these limitations through additional analyses or clarifications is recommended.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      - So far it is not clear to me how read numbers were normalised and quantified. For instance, Figure 1C only reports raw read numbers. In L149: "Prior to these analyses, to avoid biases caused by different sequencing depths, we corrected the read counts for the total numbers of deduplicated reads in each library and used the residuals as unbiased estimates of aphid, mildew and microbe loads". Was library size considered? Is the load the ratio between exogenous vs no exogenous reads? It is described in L461, but according to this, read counts were normalised and duplicated reads were removed. Now, why read counts were used? As opposite to total coverage / or count of bases per base? I cannot follow how variation in sequencing quality was considered. I can imagine that samples with higher sequencing depth will tend to have higher exogenous reads (just higher resolution and power to detect something in a lower proportion).

      Correcting for sequencing depth/library size is indeed very important. As the reviewer noted, we had explained how we did this in the methods section (L464), and we now also point to it in the results (L151):

      “Finally, we log transformed all read counts to approximate normality, and corrected for the total number of deduplicated reads by extracting residuals from the following linear model, log(read_count + 1) ∼ log(deduplicated_reads), which allowed us to quantify non-Thlaspi loads, correcting for the sequencing depth of each sample.”

      We showed the uncorrected read-counts only in Fig 1 to illustrate the orders of magnitude but used the corrected read-counts (also referred to as “loads”) for all subsequent analyses.

      In our view, theoretically, the best metric to correct the number of reads of a specific contaminant organism, is the total number of DNA fragments captured. Importantly, this is not well reflected by the total number of raw reads because of PCR and optical duplicates occurring during library prep and sequencing. For this reason we estimated the total number of reads captured multiplying total raw reads (after trimming) by the deduplication rate obtained from FastQC (methods L409-411). This metric reflects the amount of DNA fragments sampled better than the raw reads. Also it better reflects MG-RAST metrics as this software also deduplicates reads (Author response image 1 below). We also removed duplicates in our strict mappings to the M. persicae and B. aphidicola genomes.

      Coverage is not a good option for correction, because it is defined for a specific reference genome and many of the read-counts output by MG-RAST do not have a corresponding full assembly. Moreover, coverage and base counts are influenced by read size, which depends on library prep and is not included in the read-counts produced by MG-RAST.

      Author response image 1.

      Linear correlations between the number of MG-RAST reads post-QC and either total (left) or deduplicated (right) reads from fastq files of four full samples (not only unmapped reads).

      - The general assumption is that plants with different origins will have genetic variants or epigenetic variations associated with pathogen resistance, which can be tracked in a GWAS. However, plants from different regions will also have all variants associated with their origin (isolation by state as presented in the manuscript). In line 169: "Having established that our method most likely captured variation in plant resistance, we were interested in the ecological drivers of this variation". It is not clear to me how variation in plant resistance is differentiated from geographical variation (population structure). in L203: "We corrected for population structure using an IBS matrix and only tested variants with Minor Allele Frequency (MAF) > 0.04 (see Methods).". However, if resistant variants are correlated with population structure as shown in Table 1, how are they differentiated? In my opinion, the analyses are strongly limited by the correlation between phenotype and population structure.

      The association of any given trait with population structure is surely a very important aspect in GWAS studies and when looking at correlations of traits with environmental variables. If a trait is strongly associated with population structure, then disentangling variants associated with population structure vs. the ones associated with the trait can indeed be challenging, a good example being flowering time in A. thaliana (e.g. Brachi et al. 2013).

      In our case, although the pest and microbiome loads are associated with population structure to some extent, this association is not very strong. This can be observed for example in Fig. 1C, where there is no clear separation of samples from different regions. This means that we can correct for population structure (in both GWAS and correlations with climatic variables) without removing the signals of association. It is possible that other associations were missed if specific variants were indeed strongly associated with structure, but these would be unreliable within our dataset, so it is prudent to exclude them.

      - Similarly, in L212: "we still found significant GWA peaks for Erysiphales but not for other types of exogenous reads (excluding isolated, unreliable variants) (Figure 3A and S3 Figure)." In a GWA analysis, multiple variants will constitute an association pick (as shown for instance in main Figure 3A) only when the pick is accentuated by lockage disequilibrium around the region under selection (or around the variant explaining phenotypic variation in this case). However, in this case, I suspect there is a strong component of population structure (which still needs to be corroborated as suggested in the previous comment). But if variants are filtered by population structure, the only variants considered are those polymorphic within populations. In this case, I do not think clear picks are expected since most of the signal, correlated with population has been removed. Under this scenario, I wonder how informative the analyses are.

      As mentioned above, the traits we analyse (aphid and mildew loads) are only partially associated with population structure. This is evident from Fig. 1C (see answer above) but also from the SNP-based heritability (Table 1, last column) which measures indeed the proportion of variance explained by genetic population structure. Although some variance is explained (i.e. the reviewer is correct that there is some association) there is still plenty of leftover variance to be used for GWAS and correlations with environmental variables. The fact that we still find GWAS peaks confirms this, as otherwise they would be lost by the population structure correction included in our mixed model.

      - How were heritability values calculated? Were related individuals filtered out? I suggest adding more detail in both the inference of heritability and the kinship matrix (IBS matrix). Currently missing in methods (for heritability I only found the mention of an R package in the caption of Table 1).

      We somehow missed this in the methods and thank the reviewer for noticing. We now added this paragraph to the chapter “Exogenous reads heritability and species identification”:<br /> “To test for variation between populations we used a general linear model with population as a predictor. To measure SNP-based heritability, i.e. the proportion of variance explained by kinship, we used the marker_h2() function from the R package heritability (Kruijer and Kooke 2019), which uses a genetic distance matrix as predictor to compute REML-estimates of the genetic and residual variance. We used the same IBS matrix as for GWAS and for the correlations with climatic variables.”

      We also added the reference to the R package heritability to the Table 1 caption.

      - Figure 2C. in line 188: "Although the baseline levels of benzyl glucosinolates were very low and probably sometimes below the detection level, plant lines where benzyl glucosinolate was detected had significantly lower aphid loads (over 70% less reads) in the glasshouse (Figure 3C)". It is not clear to me how to see these values in Figure 2C. From the boxplot, the difference in aphid loads between detected and not detected benzyl seems significantly lower. From the boxplot distribution is not clear how this difference is statistically significant. It rather seems like a sampling bias (a lot of non-detected vs low detected values). Is the difference still significant when random subsampling of groups is considered?

      Here the “70% less reads” refers to the uncorrected read-counts directly (difference in means between samples where benzyl-GS were detected vs. not). We agree with the reviewer that this is confusing when referred to figure 2C which depicts the corrected M. persicae load (residuals). We therefore removed that information.

      Regarding the significance of the difference, we re-calculated the p value with the Welch's t-test, which accounts for unequal variances, and with a bootstrap t-test. Both tests still found a significant difference. We now report the p value of the Welch’s t-test.

      - I think additional information regarding the read statistics needs to be improved. At the moment some sections are difficult to follow. I found this information mainly in Supplementary Table 1. I could not follow the difference in the manuscript and supplementary materials between read (read count), fragment, ambiguous fragments, target fragments, etc. I didn't find information regarding mean coverage per sample and relative plant vs parasite coverage. This lack of clarity led me to some confusion. For instance, in L207: "We suspected that this might be because some non-Thlaspi reads were very similar to these highly conserved regions and, by mapping there, generated false variants only in samples containing many non-Thlaspi reads". I find it difficult to follow how non-Thlaspi reads will interfere with genotyping. I think the fact that the large pick is lost after filtering reads is already quite insightful. However, in principle I would expect the relative coverage between non-Thlaspi:Thlaspi reads to be rather low in all cases. I would say below 1%. Thus, genotyping should be relatively accurate for the plant variants for the most part. In particular, considering genotyping was done with GATK, where low-frequency variants (relative coverage) should normally be called reference allele for the most part.

      We agree with the reviewer that some clarification over these points is necessary! We modified Supplementary Table 1 to include coverage information for all samples before and after removal of ambiguous reads and explained thoroughly how each value in the table was obtained. Regarding reads and fragments, we define each fragment as having two reads (R1 and R2). The classification into Target, Ambiguous and Unmapped reads was based on fragments, so we used that term in the table, but referring to reads has the same meaning in this context as for example an unmapped read is a read whose fragment was classified as unmapped.

      We did not include the pest coverage specifically, because this cannot be calculated for any of the read counts obtained with MG-RAST as this tool is mapping to online databases where genome size is not necessarily known. What is more meaningful instead are the read counts, which are in Supplementary tables 2 and 6. Importantly as mentioned in other answers, if different taxa are differently represented in the databases this does not affect the comparison of read counts across different samples, but only the comparison of different taxa which was not used for any further analyses.

      Regarding the ambiguous reads causing unreliable variants, these occur only in very few regions of the Thlaspi genome that are highly conserved in evolution or of very low complexity. In these regions reads generated from both plant or for instance aphid DNA, can map, but the ones from aphid might contain variants when mapping to the Thlaspi reference genome (L207 and L300). The reviewer is right that there is only a very small difference in average coverage when removing those ambiguous reads (~1X, S1 Table), but that is not true for those few regions where coverage changes massively when removing ambiguous reads as shown on the right side Y axes of S2 Figure. Therefore these unreliable variants are not low-frequency and therefore not removed by GATK.

      - L215. I am not very convinced with the enrichment analyses, justified with a reference (52). For instance, how many of the predicted picks are not close to resistance genes? How was the randomisation done? At the moment, the manuscript reads rather anecdotally by describing only those picks that effectively are "close" to resistance genes. For instance, if random windows (let's say 20kb windows) are sampled along the genome, how often there are resistant genes in those random windows, and how is the random sampling compared with observed picks (windows).

      Enrichment is by definition an increase in the proportion of true positives (observed frequency: proportion of significant SNPs located close to a priori candidate genes) compared to the background frequency (number of all SNPs located close to a priori candidate genes). So the background likelihood of SNPs to fall into a priori candidate SNPs (i.e. the occurrence of a priori candidate genes in randomly sampled windows, as suggested by the reviewer) is already taken into account as the background frequency. We now explained more extensively how enrichment is calculated in the relevant methods section (L545-549), but it is an extensively used method, established in a large body of literature, so it can be found in many papers (e.g. Atwell et al. 2010, Brachi et al. 2010, Kawakatsu et al. 2016, Kerdaffrec et al. 2017, Sasaki et al. 2015-2019-2022, Galanti et al. 2022, Contreras-Garrido et al. 2024).

      Although we had already calculated an upper bound for the FDR based on the a priori candidates, as in previous literature, we now further calculated the significance of the enrichment for the Bonferroni-corrected -log(p) threshold for Erysiphales. Calculating significance requires adopting a genome rotation scheme that preserves the LD structure of the data, as described in the previously mentioned literature (eg. Kawakatsu et al. 2016, Sasaki et al. 2022). Briefly, we calculated a null distribution of enrichments by randomly rotating the p values and a priori candidate status of the genetic variants within each chromosome, for 10 million permutations. We then assessed significance by comparing the observed enrichment to the null distribution. We found that the enrichment at the Bonferroni corrected -log(p) threshold is indeed significant for Erysiphales (p = 0.016). We added this to the relevant methods section and the code to the github page.

      In addition, many other genes very close (few kb max) to significant SNPs were not annotated with the “defense response” GO term but still had functions relatable to it. Some examples are CAR8, involved in ABA signalling, PBL7 in stomata closure and SRF3 in cell wall building and stress response  (Fig 3D). This means that our enrichment is actually most likely underestimated compared to if we had a more complete functional annotation.

      - L247. Additional information is needed regarding sampling. It is not clear to me why methylation analyses are restricted to 20 samples, contrary to whole genome analyses.

      The sampling is best described in the original paper (on natural DNA methylation variation; Galanti et al. 2022), although the most important parts are repeated in the first chapter of the methods.<br /> Regarding methylation analysis, they are not restricted to 20 samples. Only the DMR calling was restricted to the 20 vs. 20 samples with the most divergent values (of pest loads) to identify regions of variation. This analysis was used to subset the genome to potential regions associated with pest presence rather than thoroughly testing actual methylation variants associated with pest presence. The latter was done in the second step, EWAS, which was based on the whole dataset with the exclusions of samples with high non-conversion rate. This left 188 samples for EWAS. We added this number in the new manuscript (L251 and L571).

      To clarify, we made a few additions to the results (L250) and methods (last two subchapters) sections, where we explain the above.

      - No clear association with TEs: in L364: "Erysiphales load was associated with hypomethylated Copia TEs upstream of MAPKKK20, a gene involved in ABA-mediated signaling and stomatal closure. Since stomatal closure is a known defense mechanism to block pathogen access (21), it is tempting to conclude that hypomethylation of the MAPKKK20 promoter might induce its overexpression and consequent stomatal closure, thereby preventing mildew access to the leaf blade. Overall, we found associations between pathogen load and TE methylation that could act both in cis (eg. Copia TE methylation in MAPKKK20 promoter) and in trans, possibly through transposon reactivation (eg. LINE, Helitron, and Ty3/Gypsi TEs isolated from genes)." I find the whole discussion related to transposable elements, first, rather anecdotical, and second very speculative. To claim: "Overall, we found associations between pathogen load and TE methylation", I believe a more detailed analysis is needed. For instance, how often there is an association? In general, there are some rather anecdotical examples, several of which are presented as association with pathogen load on the basis of being "in proximity" to a particular region/pick. The same regions contain multiple other genes and annotations, but the authors limit the discussion to the particular gene or TE concordant with the hypothesis. This is for both the discussion and results sections.

      Here we are referring to associations in a purely statistical sense. The fact that “Overall, we found associations between pathogen load and TE methylation” is simply a conclusion drawn from Fig. 4b, without implying any causality. Some methylation variants are statistically associated with the traits (aphid or mildew loads), and whether they are true positives or causal is of course more difficult to assess.

      Regarding the methylation variants associated with mildew load in proximity of MAPKKK20, those are the only two significant ones, located close to each other and close to many other variants that, although not significant, have low P-values (Author response image 2 below), so it is the most obvious association warranting further exploration. The reviewer is correct that there are other genes flanking the large DMR that covers the TEs (Fig. 4D), but the DMR is downstream of these genes, so less likely to affect their transcription.

      Author response image 2.

      Regarding all other associations found with M. persicae load, we stated that these are not really reliable due to a skewed P-value distribution (L269, S5B Fig), but we think that for future reference it is still worth reporting the closeby genes and TEs.

      We slightly changed the wording of the passage the reviewer is citing above to make it clearer that we are only offering potential explanations for the associations we observe with TE methylation, but by no means we state that TE reactivation is surely what is happening.

      - One conclusion in the manuscript is that DMRs have been mostly the result of hypomethylation. This is shown for instance in supplementary Figure 4. However, no general statistic is shown of methylation distribution (not only restricted to DMRs). Was the ratio methylation over de-methylation proportional along the genome? Thus the finding in DMRs is out of the genome-wide distribution? Or on the contrary, the DMRs are just a random sampling of the global distribution. The same for different annotated regions. For instance, I would expect that in general coding regions would be less methylated (not restricted to DMRs).

      Complete and exhaustive analyses of the methylomes were already published in the original manuscript (Galanti et al 2022). However, the variation among these methylomes is complex and influenced by multiple factors including genetic background and environment of origin, and talking about these things would have been beyond the scope of our paper. In this paper, we just took advantage of the existing methylome information to identify the few genomic regions that are consistently differentially methylated between samples with extreme values of pest loads. As for the GWAS, the phenotypes are only partially associated with population structure, so the 20 samples with the lowest and the 20 with the highest pathogen loads are not e.g. all Swedish vs. all German but they are a mixture, which allowed us to correct for population structure running EWAS with a mixed model that includes a genetic distance matrix.

      In this study we called DMRs between two defined groups: samples with the lowest amounts of pathogen DNA (not-infected; the “control” group) vs. samples with the highest amounts of pathogens (infected or the “treatment” group), so we could define a directionality (“hyper vs. “hypo” methylation). However, this is not the case for population DMRs called between many different combinations of populations. This is why the hyper- and hypomethylated regions found here cannot be compared to the genome-wide averages, which are influenced by other factors than the pathogens. Even with relaxed thresholds we indeed found very few DMRs associated to pathogen presence here.

      Specifically about coding regions, the reviewer is correct that they are less methylated, especially because T. arvense has largely lost gene body methylation (Nunn et al. 2021, Galanti et al. 2022), but this is unrelated and was discussed in the original publication (Galanti et al. 2022).

      Minor comments:- Figure 1B: it would be good to add also percentage values.

      As the figure is already tightly packed, we rather keep it simple. As the chart gives a good impression of frequencies of different kingdoms, and the frequences of several relevant groups. Also, as explained in a previous answer, comparing different taxonomic groups could be imprecise (as opposed to comparing the same group between different samples), so exact percentages seem unnecessary. If needed, the exact percentages can still be calculated from S2 Table.

      - L159: It is not clear to me what "enemy variation" is referring to here.

      We are referring to variation in enemy densities (attack rates) in the field, that could potentially be carried over to the greenhouse to cause the patterns of infection we observed. We changed it to “variation in enemy densities” to make it more clear.

      - L259: "In accordance with previous studies (8,9), most DMRs were hypomethylated in the affected samples, indicating that genes needed for defense might be activated through demethylation". Not clear to me what "affected samples" is referring to. Samples with lower load?

      Affected samples have a higher load of pathogen reads. We changed it to “infested” to make it more clear.

      - L336. Figure should be Fig 3E.

      We fixed it, thanks for noticing.

      ADDITIONAL CHANGES

      We updated reference 43 to point to the published paper rather than the preprint.

      We corrected the phenotype names in S3 Fig, to make them consistent with the rest of the manuscript and increased font size on the axes to make it more readable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      The contribution of individual resides is shown in Figure 3c, which highlights one of the strengths of this RBM implementation - it is interpretable in a physically meaningful way. However, there are several decisions here, the justification of which is not entirely clear.

      i) Some of the residues in Fig 3c are stated as "relevant" for aminoacylated PG production. But is this the only such hidden unit? Or are there others that are sparse, bimodal, and involve "relevant" AA?

      Thanks for bringing this important question to our attention. In fact,  this was the only hidden unit involving the combination of positions 152 and 212.  Although we don't  have knowledge of all relevant amino acids for this catalytic process, the residues we uncover were however shown through experimental analysis to be critical for the catalytic function of two MprF variants, and thus since our protein of interest involved this function, any domain which did not contain these residues were excluded. We can't rule out that the domains we excluded from further analysis could be performing similar catalytic functions, but we found it unlikely considering the amino acids found in the negative portion of the weight were chemically unlikely to form a complex with the amino acid lysine. We have clarified in the text, that this selection is probably a subset of all important amino acids, however, this selection provided predictive power.

      ii) In order to filter the sequences for the second stage, only those that produce an activation over +2.0 in this particular hidden unit were taken. How was this choice made?

      The +2.0 was chosen as it ensured that the bimodal distribution was split into two distinct distributions.

      iii) How many sequences are in the set before and after this filtering? On the basis of the strength of the results that follow I expect that there are good reasons for these choices, but they should be more carefully discussed.  

      We started with 11,507 sequences and after filtering we had 7,890 to train our model with.  We think this number still maintains robust statistics. This is noted in the Dataset acquisition and pre-processing section of the Methods section.

      iv) Do the authors think that this gets all of the aminoacylated PG enzymes? Or are some missed?

      This is an interesting question that prompted us to do further analysis. We have added a new supplemental figure providing more details to this question. Based on the Uniprot derived annotations and the Pfam domain-based analysis of these sequences, the large majority of sequences that were excluded were proteins which included the LPG_synthase_C domain but not the transmembrane flippase domain required by the MprF class of enzymes, and were instead accompanied by different domains which  seem less relevant to our enzyme of interest.  It is true though, and related to question (i), that variants which might retain the functionality despite losing experimentally determined key catalytic residues could have been excluded by this method, but such sequences could still be reasonably excluded due to their dissimilarity with MprF from Streptococcus agalactiae.

      However, some similar criticisms from the last point occur here as well, namely the selection of which weights should be used to classify the enzymes' function. Again the approach is to identify hidden unit activations that are sparse (with respect to the input sequence), have a high overall magnitude, and "involve residues which could be plausibly linked to the lipid binding specificity."

      (i) Two hidden units are identified as useful for classification, but how many candidates are there that pass the first two criteria? Indeed, how many hidden units are there?

      We note in the Model training section of the methods that our final model used had 300 hidden units in total.  As to the first part of your question,  rather than systematically test the predictive power of all other hidden units to this task, we decided to use the weights that we did because of their connection to a proposed lipid binding pocket found through Autodocking experiments. While another weight might provide predictive power, it might lack this critical secondary information. Moreover, the direction of our research necessitated finding weights which first satisfied our lipid-binding pocket plausibility before using these weights to propose MprF variants to test for our novel functionality. Given the limited information we had early in the research process, to go in reverse would have provided too many options for experimental testing with reduced mechanistic justification. We included a brief explanation of our rationale in section " Restricted Boltzmann Machines can provide sensitive, rational guidance for sequence classification “ in the updated manuscript.

      ii) The criterion "involve residues which could be plausibly linked to the lipid binding specificity" is again vague. Do all of the other candidate hidden units *not* involve significant contributions from substrate-binding residues? Maybe one of the other units does a better job of discriminating substrate specificity. (As indicated in Figure 8, there are examples of enzymes that confound the proposed classification.) Why combine the activations of two units for the classification, instead of 1 or 3 or...?

      In fact, it is true that the other hidden units do not involve significant contribution to substrate-binding residues, and we will clarify this. The weights found through this RBM methodology are biased to be probabilistically independent, meaning that the residues and amino acids implicated by each weight are not shared among the other weights through the design of the model. We will update the Model Weight selection section to clarify that the weights we chose had more significantly weighted residues overlapping with the residues near the lipid-binding region than the other weights we checked. We combined these two because they were the only ones which had both overlap with these residues and predictive power of lipid activity with the few sequences we had detailed knowledge of at the time of decision (Figure 5b).

      The Model Weight section reads as follows:

      “Weights were chosen which involved sequence coordinates implicated in our function of interest. Specifically, locations identified through Autodock (Hebecker et al., 2015) where the lipid was likely to interact, and a small radius around this region to select a small set of coordinates. We chose the only weights which had both overlap with multiple residues in this chosen radius and predictive power (separation) for the three examples we had to start with.”

      Author Recommendations:

      The manuscript will likely be read by many membrane biologists/biochemists, and they might like to better understand how the RBM might be useful in their own approach. Here are some suggestions along these lines. The overall goal is to explain the RBM in *plain English* - the mathematical description in Eqs 2-4 is not easily interpretable.

      (1a) Explain that the RBM is a two-layer structure, in which one layer is the "visible" elements of the input sequence, and the other is called "hidden units." Connections are only made between visible and hidden units, but all such connections are made.

      (1b) The strengths of these connections are called "weights", and are determined in a statistical way based on a large set of input sequences. Once parametrized, the RBM is capable of capturing correlations among many positions in an input sequence - a significant advantage over the DCA approach.

      We agree with this assessment, and have updated the section of the text where we introduce the RBM with a non-technical explanation of what this method is doing. It reads as:

      “The design of this RBM can be seen in Figure 4, where the model architecture is represented by purple dots and green triangles. The dots are the “visible” layer, which take in input sequences and encode them into the “hidden” layer, where each triangle represents a separate hidden unit. The lines connecting the visible and hidden layers show that each hidden unit can see all the visible units (the statistics are global), but they cannot see any of the other hidden units, meaning the hidden units are mutually independent. This global model with mutually independent hidden units (see also the marginal distribution form shown in Equation 3) has the following useful properties: higherorder couplings between... “

      (1c) Although strictly true that the DCA model is a Boltzmann machine, it's not a typical Boltzmann machine, because all of the units are visible. Typically a Boltzmann machine would also include hidden units, in order to increase its capacity/power. 

      We have clarified the relationship between DCA and Boltzmann machines, and this section now reads as:

      This class of models is closely related to another model termed the Boltzmann machine. The Boltzmann machine formulation is closely related to the Potts model from physics, which was successfully applied in biology to elucidate important residues in protein structure and function (Morcos et al., 2011), and another example being the careful tuning of enzyme specificity in bacterial two-component regulatory systems (Cheng et al., 2014; Jiang et al., 2021). The Boltzmann machine-like formulation from Morcos et al. (2011), termed Direct Coupling Analysis (DCA), stores patterns...

      (1d) Throughout, the authors refer to the activation of the hidden units as weights, but this is not a typical usage of this terminology. Connections between units are weights and have two subscripts. Given an input sequence, the sum over these weights for a given hidden unit is its activation (Eq. 1). I suggest aligning the description with the typical usage in order to make the presentation easier to follow. Hereafter I will refer to these hidden unit activations as simply activations. 

      We agree with you, the hidden units are a collection of edge weights. We have modified the terminology in the text and in our figures to consistently refer to the collections of weights as hidden units and refer to the hidden unit outputs given a sequence input as activations.

      (1e) How many hidden units are there?

      The final model was trained with 300 hidden units.

      (2) It is redundant to say that lipids are both amphiphiles and hydrophobic...amphiphile already means hydrophobic plus hydrophilic. 

      This is true, we have edited the manuscript to reflect this.

      (3) What does this mean, and what's the point of this remark? "They [lipids] are relatively smaller than other complex biomolecules, such as proteins, thereby allowing a larger portion of their surface to interact with other macromolecules." 

      We have removed this sentence.

      Reviewer 2 (Author Recommendations):

      While the idea of filtering out a part of the sequence data obtained with BLAST makes sense per se, it would be nice if the authors could comment on the nature of the sequences corresponding to the left peak in Figure 3b. It is hypothesised in conclusion that these sequences could lack any catalytic function. Could the authors experimentally check that this is the case or provide further evidence for this hypothesis?

      Yes, in this revision we provide further evidence as a new supplementary figure S2. At the time we performed domain analysis of the sequences we excluded; most of these sequences lacked the flippase domain associated with MprF function, and instead were combined with different domains. On this basis we excluded them due to their lack of relevance to the MprF from Streptococcus agalactiae we were interested in. Although there is possibility that some relevant sequences might be excluded, our assessment is that we gained specificity by reducing the set of sequences. 

      A key step in the RBM-based approach is the identification of "meaningful" hidden units, i.e. whose values are related to biological function. In Methods, the authors explain how they selected these units based on the L1 norms of the weights and the region of interaction with the lipid. While these criteria are reasonable, I wonder whether they are too stringent. In particular, one could think that regions in the proteins not in direct contact with the lipid could also be important for binding. It is known for instance that the length of loops can affect flexibility and help regulate activity in some catalytic enzymes. So my question is: if one relaxes the criterion about the coordinates of large weight values, what happens? Are other potentially interesting hidden units identified?

      We completely agree that other regions of the protein are likely involved in determining enzyme specificity, and that focusing on solely regions which interact with the lipid is perhaps missing important contributions to the catalytic function; we hypothesize that the flippase domain itself and its interaction with the catalytic domain are involved, especially considering the concerted mechanism by which they must operate. We are currently investigating these theories and will be the subject of future work. As an initial step, we present this current work with restricted information that led to concrete predictions. We focused on the lipid binding pocket because it was one of just a few bits of information we had from the start, but as the reviewer suggests, we plan to follow up our research to try to identify other relevant hidden units and domains. 

      From a purely machine-learning point of view, it would be good to see more about cross-validation of the model. More precisely, could the authors show the log-likelihood of test set data compared to the one of training sequence data?

      We agree this is an important piece of information. We will update our methods section with this information. We performed a parameter sweep to search for the parameter’s we used in our final model, and in that testing with a random 80/20% training/test split we had a training log probability loss of -0.91, and a test loss of -0.98. However, for our final model we used all available data and did not perform a split; the final result did not change dramatically by including the additional data, and the weight structure and composition was consistent with the results presented in the paper.

      Reviewer 3 (Public Review):

      In many of the analyzed strains, the presence of the lipid species Lys-PG, Lys-Glc-DAG, and Lys-Glc2-DAG is correlated to the presence of the MprF enzyme(s), but one should keep in mind that a multitude of other membrane proteins are present that in theory could be involved in the synthesis as well. Therefore, there is no direct evidence that the MprF enzymes are linked to the synthesis of these lipid species. Although, it is unlikely that other enzymes are involved, this weakens the connection between the observed lipids and the type of MprF. 

      While there are a number of proteins found on the membrane that could play a role, we have specifically used a background strain that has a transposon in mprF that makes the bacteria incapable of synthesizing Lys-lipids (Figure 7B) unless complemented back with a functional MprF (Figure 7D-E). This led us to conclude that MprF is responsible for Lys-lipid synthesis.

      Related to this, in a few cases MprF activity is tested, but the manuscript does not contain any information on protein expression levels. Heterologous expression of membrane proteins is in general challenging and due to various reasons, proteins end up not being expressed at all. As an example, the absence of activity for the E. faecalis MprF1 and E. faecium MprF2 could very well be explained by the entire absence of the protein.

      The genes were expressed on the same plasmid to control for expression. While we did not run a western blot to examine expression levels the plasmid backbone was used as a control for protein expression. Previous research supports E. faecalis MprF1 and E. faecium MprF2 not synthesizing Lys-lipids and instead most likely play a different role in the cell membrane. 

      The title is somewhat misleading. The sequence statistics and machine learning categorized the MprFs, but the identification of a novel lipid species was a coincidence while checking/confirming the categorization. 

      We believe the title is appropriate given that the identification of Enterococcus dispar was through computational methods that led to the discovery Lys-Glc2-DAG. In other words, the categorization of potential organisms that produce lipids related to MprF has been driven by the proposition from the computational method. We agree, however, that the discovery was unexpected but would not have happened without the suggested organisms coming from the methodology presented here.  

      Please read the manuscript one more time to correct textual errors.  

      The example of the role of LPS in delivering siRNA to targeted cancer cells is a bit farfetched as LPS is very different from the lipids that are being discussed here. I would rather focus on the role of Lysyl-lipids in antibiotic resistance in the introduction.  

      We included LPS here to explain that natural lipids/components of the bacterial cell membrane could be used for drug delivery systems. While it is true LPS is quite different from Lys-lipid compounds, our goal was to create an emphasis on how the bacterial domain is a rich untapped source of lipids that could be used in biotechnology.  In this way we wanted our statement to be more broadly about bacterial lipids and the importance of their continued study for diverse applications like pharmaceuticals.

      The MS identification of Lys-Glc2-DAG is convincing, especially in combination with the fragmentation data, but the ion counts suggest low abundance. The observation would be strengthened if the identification of Lysyl-Glc2-DAG with different acyl-chain configurations has been observed. This should be then mentioned or visualized in the manuscript. 

      We agree and have added an updated Figure 8A to demonstrate the presence of different acyl-chain configurations in Enterococcus dispar.  

      Further analysis of the Enterococcus strains shows the presence of the three lipids Lys-PG, Lys-Glc-DAG, and LysGlc2-DAG, although the Lys-Glc-DAG is only detected in trace amounts. This raises questions on the specificity of the MprF for the substrate Glc-DAG. If the ratio of Glc2-DAG compared to Glc-DAG abundance is similar to the ratio of Lys-Glc2-DAG vs. Lys-Glc-DAG abundance, this would strengthen the observation that the enzyme has equal affinity. However, if there is a rather large amount of Glc-DAG but a small amount of Lys-Glc-DAG, the production of Lys-Glc-DAG might be a side-reaction. 

      The reviewer brings a relevant point of discussion, however, a clear resolution might be part of future work as we do not use spike in controls when completing lipid extractions. Because of this, it  it is not possible for us to compare lipid levels across different samples. We now include a note clarifying this in the discussion section.  

      The plotting of the MprF sequence variants using the chosen RBM weights reveals a rather complex distribution over the quadrants (Figure 8). It is rather unclear in Figure 8 why only 1 sequence is plotted for Enterococcus faecalis and faecium, while 2 different MprFs are present (and tested) for these two organisms. This should be clarified.  

      We agree this can be a source of confusion. We have further clarified this in the text that only the functional alleles were plotted in Figure 8 and that all Enterococcal alleles are plotted in Figure S3 regardless of function.

    1. Author response:

      Reviewer 1:

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

      We agree with the reviewer- Fgf signaling is an established pro-gliogenic pathway (Duong et al 2019) and Foxg1 overexpression is known to promote neurogenesis in cultured neural stem cells (Branacaccio et al 2019). Our study links these two mechanisms, as the Reviewer has summarized: (a) we demonstrate that FOXG1 works via modulating Fgf signaling cell-autonomously within progenitors by regulating the levels of Fgfr3. (b) Loss of Foxg1 in postmitotic neurons results in the upregulation of Fgf ligand expression (possibly via indirect mechanisms) and this non-cell autonomously increases Fgf signaling in progenitors. Our study is entirely performed in vivo.

      Proposed revision: We will revise the manuscript to reflect that Fgf18 may be an indirect target of FOXG1 in postmitotic neurons.

      Reviewer 2:

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Proposed revisions:

      (a) We will revise the manuscript to explain why we chose postnatal day 14 to examine the effects of Foxg1 deletion at E15.

      ● We have examined the transcriptomic dysregulation after Foxg1 deletion at E17.5, which is a reasonable period to identify potential direct targets. Furthermore, FOXG1 occupies the Fgfr3 locus in ChIP-seq performed at E15.5. Together, these support the interpretation that Fgfr3 is a direct target of Foxg1.

      ● As the Reviewer notes, we have investigated the possibility of increased proliferation of astrocytes and death of neurons and found no evidence that suggests these phenomena occur in the 3 days after loss of Foxg1. Cortical neurons are postmitotic and differentiated by E18.5, the stage at which we examined CC3 staining and found no difference in cell death in control and mutants (Supplementary Figure S2C, C’). The majority of progenitors (PAX6+ve cells) that lose Foxg1 at E15.5 express the gliogenic transcription factor NFIA by E18.5 (Figure 2C, C’), but hardly any express intermediate (neurogenic) progenitor marker TBR2 (Supplementary Figure S2B, B’). It is therefore unlikely that neurons are born from Foxg1 mutant progenitors and then die at a later stage.

      ● The cellular consequences of loss of Foxg1 require additional time to detect e.g. it takes ~ 5 days for GFAP to be detected in astrocytes once they are born. The P14 timepoint permits the assessment of oligogenesis which begins after astrogliogenesis and therefore permits a comprehensive assessment of the lineage of E15.5 Foxg1 null progenitors.

      (b) Thank you for pointing out that the last sentence of the results section implied (incorrectly) that ALL neurogenesis is not possible in the absence of Foxg1 We will modify this (and the discussion) to reflect that this applies to E14/15 progenitors and late-born cortical neurons.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4. They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a mild yet statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult-specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ cells in TH-negative cells too (although not widely throughout the brain). This is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity, and a suggestive link to neuronal activity. Finding out the direct link to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context, as it opens opportunities for discovery by the neuroscience community.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We have already provided a description of the use of Imaris in the methods section.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Here the authors present their evidence linking the mitochondrial uniporter (MCU-1) and olfactory adaptation in C. elegans. They clearly demonstrate a behavioral defect of mcu-1 mutants in adaptation over 60 minutes and present evidence that this gene functions in the AWC primary sensory neurons at, or close to, the time of adaptation. 

      Strengths: 

      The paper is very well organized and their approach to unpacking the role of mcu-1 mutants in olfactory adaptation is very reasonable. The authors lean into diverse techniques including behavior, genetics, and pharmacological manipulation in order to flesh out their model for how MCU-1 functions in AWC neurons with respect to olfaction. 

      Weaknesses: 

      I would like to see the authors strengthen the link between mitochondrial calcium and olfactory adaptation. The authors present some gCaMP data in Figure 5 but it is unclear to me why this tool is not better utilized to explore the mechanism of MCU-1 activity. I think this is very important as the title of the paper states that "mitochondrial calcium modulates.." behavior in AWC and so it would be nice to see more evidence to support this direct connection. I would also like to see the authors place their findings into a model based on previous findings and perhaps examine whether mcu-1 is required for EGL-4 nuclear translocation, which would be straightforward to examine. 

      We agree that observing calcium levels inside the mitochondria would conclusively demonstrate that mitochondria calcium directly impacts neuropeptide secretion and behavior. We will try to do this with a mitochondrially targeted calcium indicator. We will also better integrate our findings to existing models in the literature, such as EGL-4 nuclear localization in AWC in response to prolonged odor exposure. Thank you for your comments.

      Reviewer #2 (Public review): 

      Summary: 

      In their manuscript, "Mitochondrial calcium modulates odor-mediated behavioural plasticity in C. elegans", Lee et al. aim to link a mitochondrial calcium transporter to higher-order neuronal functions that mediate memory and aversive learning behaviours. The authors characterise the role of the mitochondrial calcium uniporter, and a specific subunit of this complex, MCU-1, within a single chemosensory neuron (AWCOFF) during aversive odor learning in the nematode. By genetically manipulating mcu-1 as well as using pharmacological activators and blockers of MCU activity, the study presents compelling evidence that the activity of this individual mitochondrial ion transporter in AWCOFF is sufficient to drive animal behaviour through aversive memory formation. The authors show that perturbations to mcu-1 and MCU activity prevent aversive learning to several chemical odors associated with food absence. The authors propose a model, experimentally validated at several steps, whereby an increase in MCU activity during odor conditioning stimulates mitochondrial calcium influx and an increase in mitochondrial reactive oxygen species (mtROS) production, triggering the release of the neuropeptide NLP-1 from AWC, all of which are required to mediate future avoidance behaviour of the chemical odor. 

      Strengths: 

      Overall, the authors provided robust evidence that mitochondrial function, mediated through MCU activity, contributes to behavioural plasticity. They also demonstrated that ectopic MCU activation or mtROS during odor exposure could accelerate learning. This is quite profound, as it highlights the importance of mitochondrial function in complex neuronal processes beyond their general roles in the development and maintenance of neurons through energy homeostasis and biosynthesis, amongst their other cell-non-specific roles. 

      Weaknesses: 

      While the manuscript is generally robust, there are some concerns that should be addressed to improve the strength of the proposed model: 

      (1) Throughout the manuscript, it is implied that MCU activation caused by odor conditioning changes mitochondrial calcium levels. However, there is no direct experimental evidence of this. For example, the authors write on p.10 "This shows that H2O2 production occurs downstream of MCU activation and calcium influx into the mitochondria", and on p. 11, the statement that prolonged exposure to odors causes calcium influx. Because this is a key element of the proposed model, experimental evidence would be required to support it. 

      We are planning to measure mitochondrial calcium levels directly by using a mitochondrially targeted calcium indicator. We agree that this is a key element of our model.

      (2) Some controls missing, e.g. a heat-shock-only control in WT and mcu-1 (non-transgenic) background in Figure 1h is required to ensure the heat-shock stress does not interfere with odor learning. 

      We will conduct the experiments again with necessary controls.

      (3) Lee et al propose that mcu-1 is required at the adult stage to accomplish odor learning because inducing mcu-1 expression at larval stages did not rescue the phenotype of mcu-1 mutants during adulthood. However, the requirement of MCU for odor learning was narrowed down to a 15' window at the end of odor conditioning (Figure 5c). Is it possible that MCU-1 protein levels decline after larval induction so that MCU-1 is no longer present during adulthood when odor conditioning is performed? 

      Yes, we also noted that the early induction of MCU-1 is not effective to restore learning, and hypothesized that MCU-1 protein may be subject to high turnover. It may be that MCU-1 induced during larval stages no longer exist by the time odor conditioning is performed, although we have not confirmed this. We had a brief sentence noting this in the discussion section, but we will discuss this a little further in the revision. Thank you.

      (4) There is a limited learning effect observable after 30 minutes, and a very pronounced effect in all animals after 90 minutes. The authors very carefully dissect the learning mechanism at 60 minutes of exposure and distinguish processes that are relevant at 60 minutes from those important at 30 minutes. Some explanation or speculation as to why the processes crucial at the 60-minute mark are redundant at 90 minutes of exposure would be important. 

      I think this is in line with Reviewer #1’s comments that we should discuss our findings more in relation to existing models in the literature. We will do this in our revision.

      (5) Given the presumably ubiquitous function of mcu-1/MCU in mitochondrial calcium homeostasis, it is remarkable that its perturbation impacts only a very specific neuronal process in AWC at a very specific time. The authors should elaborate on this surprising aspect of their discovery in the discussion. 

      We will discuss the implication further in our revised manuscript.

      (6) Associated with the above comment, it remains possible that mcu-1 is required in coelomocytes for their ability to absorb NLP-1::Venus (Figure 3B), and the AWC-specific role of mcu-1 for this phenotype should be determined. 

      To confirm that mcu-1 is not required for coelomocyte uptake, we can stimulate NLP-1:Venus secretion in mcu-1 worms by adding H2O2, then observe whether Venus is observed in the coelomocytes. We will include this in our revised manuscript. Thank you for your comments.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports a role for the mitochondrial calcium uniporter gene (mcu-1) in regulating associative learning behavior in C. elegans. This regulation occurs by mcu-1-dependent secretion of the neuropeptide NLP-1 from the sensory neuron AWC. The authors report a post-developmental role for mcu-1 in AWC to promote learning. The authors further show that odor conditioning leads to increases in NLP-1 secretion from AWC, and that interfering with mcu-1 function reduces NLP-1 secretion. Finally, the authors show that NLP-1 secretion increases when ROS levels in AWC are genetically or pharmacologically elevated. The authors propose that mitochondrial calcium entry through MCU-1 in response to odor conditioning leads to the generation of ROS and the subsequent increase in neuropeptide secretion to promote conditioned behavior. 

      Strengths: 

      (1) The authors show convincingly that genetically or pharmacologically manipulating MCU function impacts chemotaxis in a conditioned learning paradigm. 

      (2) The demonstration that the secretion of a specific neuropeptide can be up-regulated by MCU, ROS and odor conditioning is an important and interesting advance that addresses mechanisms by which neuropeptide secretion can be regulated in vivo. 

      Weaknesses: 

      (1) The authors conclusion that mcu-1 functions in the AWC-on neuron is not adequately supported by their rescue experiments. The promoter they use for rescue drives expression in a number of additional neurons including AWC-on, that themselves are implicated in adaptation, leaving open the possibility that mcu-1 may function non-autonomously instead of autonomously in AWC to regulate this behavior. 

      We recognized this as well, and we now have a promoter construct more specific to AWCON (str-2). Using this more specific promoter, we will confirm that the role of mcu-1 is indeed AWCON-specific in our revised manuscript.

      (2) The authors conclude MCU promotes neuropeptide release from AWC by controlling calcium entry into mitochondria, but they did not directly examine the effects of altered MCU function on calcium dynamics either in mitochondria or in the soma, even though they conducted calcium imaging experiments in AWC of wild type animals. Examination of calcium entry in mitochondria would be a direct test of their model.

      We agree. As we stated above for reviewer #1 and #2, we will include results from the mitochondrial calcium data in our revised manuscript.

      (3) The authors' conclusion that mitochondrial-derived ROS produced by MCU activation drives neuropeptide release does not appear to be experimentally supported. A major weakness of this paper is that experiments addressing whether mcu-1 activity indeed produces ROS are not included, leaving unanswered the question of whether MCU is the endogenous source of ROS that drives neuropeptide secretion.

      We can confirm this using mitochondrially targeted redox indicator roGFP, and we will be sure to include the data in the revised manuscript. Thank you for your comments.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths:

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work.

      Weaknesses:

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery.

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model.

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; and 5) intensity sensitivity. Here, we are following the same terminology employed in bioRxiv 2024.08.04.606534, the paper highlighted by the referee. Regarding the hallmark 6) subliminal accumulation, we also believe that our model can capture it as well, but more analyses are needed to substantiate this claim. We will include the discussion of these points in the revised version.

      Notably, in line with the discussion in bioRxiv 2024.08.04.606534, we also think that feature 10) long-term habituation, is ambiguous and its appearance might be simply related to the other features discussed above. In the revised version, we will detail our take on this aspect in relation to the presented model.

      All other hallmarks require the presence of multiple stimuli and, as a consequence, they cannot be observed within our model, but are interesting lines of research for future investigations. We believe that this addition will help clarify the validity of the model and the relevance of our result, consequently improving the quality of our manuscript.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed?

      The referee is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes. In the next version, when different emerging behaviors characterizing habituation are discussed, we will also present a set of parameters for which habituation can be better appreciated, justifying our new choice.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as defined in bioRxiv 2024.08.04.606534 for example, we can say that the system is habituated after a few stimuli for the set of parameters selected in the first version of the manuscript. We will also discuss this aspect in the Supplemental Material of the revised version, as it will also be important to appreciate the hallmarks of habituation listed above.

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above.

      The point about information is more subtle. We can definitely choose a set of parameters for which the information gain is higher and we will show it in the Supplemental Material of the revised version. However, as the reviewer correctly points out, it is difficult to give an interpretation of the specific value of I_U,H for such a minimal model.

      We also remark that, since the readout population and the receptor both undergo a fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus and, as such, the mutual information presents a discontinuous behavior resembling the dynamics of the readout.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. We will discuss analogies and differences in the revised version of the main text. The main difference is the fact that information-theoretic aspects of habituation are not discussed in the presented references, while the idea of this work is to elucidate exactly the interplay between information gain and habituation dynamics.

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained:

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is an important and delicate aspect to discuss. We considered the mutual information with a prolonged stimulation when building the Pareto front, by maximizing this quantity while minimizing the dissipation. The observation that the Pareto front lies in the vicinity of the maximum of the information gain hints at the fact that reducing the information gain by increasing the mutual information at each stimulation will require more energy. However, we did not thoroughly explore this aspect by considering all sources of dissipation and the fact that habituation is, anyway, a dynamical phenomenon. In the revised version, we will clarify this point, extending our analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain mutual information, multiple observations of the same stimulus have to reflect into accumulated infor

      mation that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid any confusion between the usual definition of (perfect) adaptation and habituation. At any rate, we will add this clarification in the revised version.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the referee for giving us the opportunity to deepen this aspect of the manuscript. We decided to minimize \delta Q_R since this dissipation is unavoidable. In fact, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R. Conversely, the dissipation associated with the storage is always zero in the limit of a fast memory. However, we know that such a limit is pathological and leads to no habituation. As a consequence, in the revised version we will discuss other choices for our optimization approach, along with their potentialities and limitations.

      The dependence of the Pareto front on the stimulus strength is shown in the Supplemental Material, but not in relation to habituation and information gain. We will strengthen this part in the revised version of the manuscript, elaborating more on the connection between optimality, information gain, and dynamical behavior.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels?

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, the fact that, without any explicit biological details, our minimal model is able to capture the features of a complex neural system just by looking at the PCs is non-trivial. The 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. Depending on the behavior of higher-order PCs, we may include them in the revised version if any interesting results arise.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination.

      We thank the referee for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed:

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the referee for this suggestion. The revised version will present a modified abstract in line with the reviewer’s proposal.

      (2) Several clarifications are needed on the treatment of energy dissipation.

      - When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the referee for this typo. Indeed, \sigma sets the energy scale of the feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., together with \kappa in Eq. (1). We will fix this issue in the revised version. Moreover, we will check the entire manuscript to be sure that all formulas are consistent.

      - I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on <H>, however, is not fully clear. If the environment were static and the memory block was absent, the term with <H> would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence. By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript). In this case, the receptor is a 2-state, 1-pathway system and, as such, it always satisfies an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript does not hold anymore and the receptor does not exhibit any dissipation. Our choice to model two different pathways has been biologically motivated. We will make this crucial aspect clearer in the revised manuscript.

      - Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate?

      In the current version of the manuscript, we employed the scheme of a controlled birth and death process to model the coupled process of readout and storage production. Since we are not dealing with a detailed biochemical underlying network, we used this coarse-grained description to capture the main features of the dynamics. In this sense, the considered reactions produce and destroy a molecule from a certain pool even if they are controlled in different ways by the readout. However, we completely agree with the point of view of the referee and will analyze our results following their suggestion.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics?

      The initial stimulus is indeed stochastic with an average constant in time. Model response depends on the pre-stimulus level, since it also sets the stationary storage concentration before the first “strong” stimulation arrives. This dependence is not crucial for our result but deserves proper discussion, as the referee correctly pointed out. We will clarify this point in the revised version of this study.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity here. Actually, Δ⟨S⟩ is not strictly zero, but equal to 0.15% at the final point. However, due to rounding this appears as 0% in the plot, and we will fix it in the revised version. Let us note that the fact that Δ⟨S⟩ is small signals a nonlinear dependence of Δ⟨U⟩ from Δ⟨S⟩, but no contradiction. We will clarify this aspect in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigates a dietary intervention that employs a smartphone app to promote meal regularity, which may be useful. Despite no observed changes in caloric intake, the authors report significant weight loss. While the concept is very interesting and deserves to be studied due to its potential clinical relevance, the study's rigor needs to be revised, notably for its reliance on self-reported food intake, a highly unreliable way to assess food intake. Additionally, the study theorizes that the intervention resets the circadian clock, but the study needs more reliable methods for assessing circadian rhythms, such as actigraphy.

      Thank you for the positive yet critical feedback on our manuscript. We are pleased with the assessment that our study is very interesting and deserves to be continued. We have addressed the points of criticism mentioned and discussed the limitations of the study in more detail in the revised version than before.

      Nevertheless, we would like to note that one condition for our study design was that the participants were able to carry out the study in their normal everyday environment. This means that it is not possible to fully objectively record food intake - especially not over a period of eight weeks. In our view, self-reporting of food intake is therefore unavoidable and also forms the basis of comparable studies on chrononutrition. We believe that recording data with a smartphone application at the moment of eating is a reliable means of recording food consumption and is better suited than questionnaires, for example, which have to be completed retrospectively. Objectivity could be optimized by transferring photographs of the food consumed. However, even this only provides limited protection against underreporting, as photos of individual meals, snacks, or second servings could be omitted by the participants. Sporadic indirect calorimetric measurements can help to identify under-reporting, but this cannot replace real-time self-reporting via smartphone application.

      Our data show that at the behavioral level, the rhythms of food intake are significantly less variable during the intervention. Our assumption that precise mealtimes influence the circadian rhythms of the digestive system is not new and has been confirmed many times in animal and human studies. It can therefore be assumed that comparable effects also apply to the participants in our study. Of course, a measurement of physiological rhythms is also desirable for a continuation of the study. However, we suspect that cellular rhythms in tissues of the digestive tract in particular are decisive for the changes in body weight. The characterization of these rhythms in humans is at best indirectly possible via blood factors. Reduced variability of the sleep-wake rhythm, which is measured by actigraphy, may result from our intervention, but in our view is not the decisive factor for the optimization of metabolic processes.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Reviewer #1 (Public Review):

      The authors Wilming and colleagues set out to determine the impact of regularity of feeding per se on the efficiency of weight loss. The idea was to determine if individuals who consume 2-3 meals within individualized time frames, as opposed to those who exhibit stochastic feeding patterns throughout the circadian period, will cause weight loss.

      The methods are rigorous, and the research is conducted using a two-group, single-center, randomized-controlled, single-blinded study design. The participants were aged between 18 and 65 years old, and a smartphone application was used to determine preferred feeding times, which were then used as defined feeding times for the experimental group. This adds strength to the study since restricting feeding within preferred/personalized feeding windows will improve compliance and study completion. Following a 14-day exploration phase and a 6-week intervention period in a cohort of 100 participants (inclusive of both the controls and the experimental group that completed the study), the authors conclude that when meals are restricted to 45min or less durations (MTVS of 3 or less), this leads to efficient weight loss. Surprisingly, the study excludes the impact of self-reported meal composition on the efficiency of weight loss in the experimental group. In light of this, it is important to follow up on this observation and develop rigorous study designs that will comprehensively assess the impact of changes (sustained) in dietary composition on weight loss. The study also reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Perhaps the most important observation is that personalized interventions that cater to individual circadian needs will likely result in more significant weight loss than when interventions are mismatched with personal circadian structures.

      We would like to thank the reviewer for the positive assessment of our study.

      (1) One concern for the study is its two-group design; however, single-group cross-over designs are tedious to develop, and an adequate 'wash-out' period may be difficult to predict.

      A cross-over design would of course be highly desirable and, if feasible, would be able to provide more robust data than a two-group design. However, we have strong doubts about the feasibility of a cross-over design. Not only does the determination of the length of the washout period to avoid carry-over effects of metabolic changes pose a difficulty, but also the assumption that those participants who start with the TTE intervention will consciously or unconsciously pay attention to adherence to certain eating times in the next phase, when they are asked to eat at times like before the study.

      In a certain way, however, our study fulfills at least one arm of the cross-over design. During the follow-up period of our study, there were some participants who, by their own admission, started eating at more irregular times again, which is comparable to the mock treatment of the control subjects. And these participants gained weight again.

      (2)  A second weakness is not considering the different biological variables and racial and ethnic diversity and how that might impact outcomes. In sum, the authors have achieved the aims of the study, which will likely help move the field forward.

      In the meantime, we have at least added analyses regarding the age and gender of the participants and found no correlations with weight loss. The sample size of this pilot study was too small for a reliable analysis of the influence of ethnic diversity. If the study is continued with a larger sample size, this type of analysis will certainly come into play.

      We are pleased with the assessment that we have achieved our goals and are helping to advance the field.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the effects of the timing of dietary occasions on weight loss and well-being with the aim of explaining if a consistent, timely alignment of dietary occasions throughout the days of the week could improve weight management and overall well-being. The authors attributed these outcomes to a timely alignment of dietary occasions with the body's own circadian rhythms. However, the only evidence the authors provided for this hypothesis is the assumption that the individual timing of dietary occasions of the study participants identified before the intervention reflects the body's own circadian rhythms. This concept is rooted in understanding of dietary cues as a zeitgeber for the circadian system, potentially leading to more efficient energy use and weight management. Furthermore, the primary outcome, body weight loss, was self-reported by the study participants.

      Strengths:

      The innovative focus of the study on the timing of dietary occasions rather than daily energy intake or diet composition presents a fresh perspective in dietary intervention research. The feasibility of the diet plan, developed based on individual profiles of the timing of dietary occasions identified before the intervention, marks a significant step towards personalised nutrition.

      We thank the reviewer for the generally positive assessment of our study and for sharing the view that our personalized approach represents an innovative step in chrononutrion.

      Weaknesses:

      (1) Several methodological issues detract from the study's credibility, including unclear definitions not widely recognized in nutrition or dietetics (e.g., "caloric event"), lack of comprehensive data on body composition, and potential confounders not accounted for (e.g., age range, menstrual cycle, shift work, unmatched cohorts, inclusion of individuals with normal weight, overweight, and obesity).

      We have replaced the term "caloric event" with "calorie intake occasion" and otherwise revised our manuscript with regard to other terminology in order to avoid ambiguity.

      We agree with the reviewer that the determination of body composition is a very important parameter to be investigated. Such investigations will definitely be part of the future continuation of the study. In this pilot study, we aimed to clarify in principle whether our intervention approach shows effects. Since we believe that this is certainly the case, we would like to address the question of what exactly the physiological mechanisms are that explain the observed weight loss in the future.

      Part of these future studies will also include other parameters in the analyses. However, in response to the reviewer's suggestions, we have already completed analyses regarding age and gender of the participants, which show that both variables have no influence on weight loss.

      In our view, the menstrual cycle should not have a major influence on the effectiveness of a 6-week intervention.

      The inclusion of shift workers is not a problem from our point of view. If their work shifts allow them to follow their personal eating schedule, we see no violation of our hypothesis. If this is not the case, as our data in Fig. 1G show, we do not expect any weight loss. Nevertheless, the reviewer is of course right that shift work can generally be a confounding factor and have an influence on weight loss success. To our knowledge, none of the 100 participants evaluated were shift workers. In a continuation of the study, however, shift work should be an exclusion criterion. Yet, our intervention approach could be of great interest for shift workers in particular, as they may be at a particularly high risk of obesity due to irregular eating times. A separate study with shift workers alone could therefore be of particular interest.

      The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Although this is a limitation, it does not raise much doubt about the effectiveness of the intervention, as a subgroup analysis shows that intervention subjects lose more weight than control subjects of the same BMI.

      The inclusion of a wide BMI range was intentional. Our hypothesis is that reduced temporal variability in eating times optimizes metabolism and therefore excess body weight is lost (which we would like to investigate specifically in future studies). We hypothesize that people living with a high BMI will experience greater optimization than people with a lower BMI. Our data in Figs. 1H and S2I suggest that this assumption is correct.

      (2) The primary outcome's reliance on self-reported body weight and subsequent measurement biases further undermines the reliability of the findings.

      Self-reported data is always more prone to errors than objectively measured data. With regard to the collection of body weight, we were severely restricted in terms of direct contact with the participants during the conduct of the study due to the Covid-19 pandemic. At least the measurement of the initial body weight (at T0), the body weight after the end of the exploration phase (at T1) and the final body weight (at T2) were measured in video calls in the (virtual) presence of the study staff. These are the measurement points that were decisive for our analyses. Intermediate self-reported measurement points were not considered for analyses. We have added in the Materials & Methods section that video calls were undertaken to minimize the risk of misreporting.

      (3) Additionally, the absence of registration in clinical trial registries, such as the EU Clinical Trials Register or clinicaltrials.gov, and the multiple testing of hypotheses which were not listed a priori in the research protocol published on the German Register of Clinical Trials impede the study's transparency and reproducibility.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE). […] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place.

      Furthermore, in our view, we did not provide less information on planned analyses than is usual and all our analyses were covered by the information in the study registry. We have stated the hypothesis in the study register that „strict adherence to [personalized] mealtimes will lead to a strengthening of the circadian system in the digestive tract and thus to an optimization of the utilization of nutrients and ultimately to the adjustment of body weight to an individual ideal value.“

      In our view, numerous analyses are necessary to test this hypothesis. We investigated whether it is the adherence to eating times that is related to the observed weight loss (Fig. 1), or possibly other variables resulting from adherence to the meal schedule (Fig. 3). In addition, we analyzed whether the intervention optimized the utilization of nutrients, which we did based on the food composition and number of calories during the exploration and intervention phases (Fig. 2). We investigated whether the personalization of meal schedules plays a role (Fig. 3). And we attempted to analyze whether the adjustment of body weight to an individual ideal value occurs by correlating the influence of the original BMI with weight loss. Only the hypothesis that the circadian system in the digestive tract is strengthened has not yet been directly investigated, a fact that is listed as a limitation. Although it can be assumed that this has happened, as the Zeitgeber “food” has lost significant variability as a result of the intervention. The analyses on general well-being are covered in the study protocol by the listing of secondary endpoints.

      Beyond that, we did not analyze any hypotheses that were not formulated a priori.

      For these reasons, we see no restriction in transparency, reproducibility or requirements and regulations.

      Achievement of Objectives and Support for Conclusions:

      (4) The study's objectives were partially met; however, the interpretation of the effects of meal timing on weight loss is compromised by the weaknesses mentioned above. The evidence only partially supports some of the claims due to methodological flaws and unstructured data analysis.

      We hope that we have been able to dispel uncertainties regarding some interpretations through supplementary analyses and the addition of some methodological details.

      Impact and Utility:

      (5) Despite its innovative approach, significant methodological and analytical shortcomings limit the study's utility. If these issues were addressed, the research could have meaningful implications for dietary interventions and metabolic research. The concept of timing of dietary occasions in sync with circadian rhythms holds promise but requires further rigorous investigation.

      We are pleased with the assessment that our data to date is promising. We hope that the revised version will already clarify some of the doubts about the data available so far. Furthermore, we absolutely agree with the reviewer: the present study serves to verify whether our intervention approach is potentially effective for weight loss - which we believe is the case. In the next steps, we plan to include extensive metabolic studies and to adjust the limitations of the present study.

      Reviewer #3 (Public Review):

      The authors tested a dietary intervention focused on improving meal regularity in this interesting paper. The study, a two-group, single-center, randomized, controlled, single-blind trial, utilized a smartphone application to track participants' meal frequencies and instructed the experimental group to confine their eating to these times for six weeks. The authors concluded that improving meal regularity reduced excess body weight despite food intake not being altered and contributed to overall improvements in well-being.

      The concept is interesting, but the need for more rigor is of concern.

      We would like to thank the reviewer for the interest in our study.

      (1) A notable limitation is the reliance on self-reported food intake, with the primary outcome being self-reported body weight/BMI, indicating an average weight loss of 2.62 kg. Despite no observed change in caloric intake, the authors assert weight loss among participants.

      As already described above in the responses to the reviewer 2, the body weight assessment took place in video calls in the (virtual) presence of study staff, so that the risk of misreporting is minimized. We have added this information to the manuscript.

      When recording food intake, we had to weigh up the risk of misreporting against the risk of a lack of validity in a permanently monitored setting. It was important to us to investigate the effectiveness of the intervention in the participants' everyday environment and not in a laboratory setting in order to be able to convincingly demonstrate its applicability in everyday life. The restriction of self-reporting is therefore unavoidable in our view and must be accepted. It can possibly be reduced by photographing the food, but even this is not a complete protection against underreporting, as there is no guarantee that everything that is ingested is actually photographed.

      However, our analyses show that the reporting behavior of individual participants did not change significantly between the exploration and intervention phases. We do not assume that participants who underreported only did so during the exploration phase (and only ate more than reported in this study phase) and reported correctly in the intervention phase (and then indeed consumed fewer calories).  We discuss this point in the section "3.1 Limitations".

      (2) The trial's reliance on self-reported caloric intake is problematic, as participants tend to underreport intake; for example, in the NEJM paper (DOI: 10.1056/NEJM199212313272701), some participants underreported caloric intake by approximately 50%, rendering such data unreliable and hence misleading. More rigorous methods for assessing food intake are available and should have been utilized. Merely acknowledging the unreliability of self-reported caloric intake is insufficient as it would still leave the reader with the impression that there is no change in food intake when we actually have no idea if food intake was altered. A more robust approach to assessing food intake is imperative. Even if a decrease in caloric intake is observed through rigorous measurement, as I am convinced a more rigorous study would unveil testing this paradigm, this intervention may merely represent another short-term diet among countless others that show that one may lose weight by going on a diet, principally due to heightened dietary awareness.

      The risks of self-reporting, our considerations, and our analysis of participants' reporting behavior and caloric intake over the course of the study are discussed in detail both in our responses above and in the manuscript. 

      With regard to the reviewer's second argument, we have largely adapted the study protocol of the control group to that of the experimental group. Apart from the fact that the control subjects were not given guidelines on eating times and were instead only given a very rough time window of 18 hours for food intake, the content of the sessions and the measurement methods were the same in both groups. This means that the possibility of increased nutritional awareness was equally present in both groups, but only the participants in the experimental group lost a significant amount of body weight.

      In future continuations of the study, further follow-up after an even longer period than four weeks (e.g. after 6 months) can be included in the protocol in order to examine whether the effects can be sustained over a longer period.

      (3) Furthermore, the assessment of circadian rhythm using the MCTQ, a self-reported measure of chronotype, may not be as reliable as more objective methods like actigraphy.

      The MCTQ is a validated means of determining chronotype and its results are significantly associated with the results of actigraphic measurements. In our view, the MCTQ is sufficient to test our hypothesis that matching the chronobiological characteristics of participants is beneficial. Nevertheless, measurements using actigraphy could be of interest, for example to correlate the success of weight loss with parameters of the sleep-wake rhythm.

      (4) Given the potential limitations associated with self-reported data in both dietary intake and circadian rhythm assessment, the overall impact of this manuscript is low. Increasing rigor by incorporating more objective and reliable measurement techniques in future studies could strengthen the validity and impact of the findings.

      The body weight data was not self-reported, but the measurements were taken in the presence of study staff. Although optimization might be possible (see above), we do not currently see any other way of recording all calorie intake occasions in the natural environment of the participants over a period of several weeks (or possibly longer, as noted by the reviewer) other than self-report and, in our opinion, it would not be feasible. For the future continuation of the study, we are planning occasional indirect calorimetry measurements that can provide information about the actual amount of food consumed in different phases of the study. These can reveal errors in the self-report but will not be able to replace daily data collection by means of self-report.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      This interesting and timely study by Wilming and colleagues examines the effect of regularity vs. irregularity of feeding on body weight dynamics and BMI. A rigorous assessment of the same in humans needs to be improved, which this study provides. The study is well-designed, with a 14-day exploration phase followed by 6 weeks of intervention, and it is commendable to see the number of participants (100) who completed the study. Incorporation of a follow-up assessment 4 weeks after the conclusion of the study shows maintained weight loss in a subset of Experimental Group (EG) participants who continue with regular meals. There are several key observations, including particular meal times (lunch and dinner), which, when restricted to 45min or less in duration (MTVS of 3 or less), will lead to efficient weight loss, as well as correlations between baseline BMI and weight loss. The authors also exclude the impact of self-reported meal composition on the efficiency of weight loss in the EG group in the context of this study. The study reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Finally, the authors highlight an important point: to provide attention to personalized feeding and circadian windows and that personalized interventions that cater to individual circadian structures will result in more significant weight loss. This is an important concept that needs to be brought to light. There are only a few minor comments listed below:

      Minor comments:

      (1) The authors may provide explanations for the reduction in the MTVS in the EG and the increase in the same for the Control Group (CG). The increases in MTVS in CG are surprising (lines 105-106) because it is assumed that there is no difference in CG eating patterns prior to and during the study.

      As the reviewer correctly states, our assumption was that there should be no change in the MTVS before and during the study - but we could not rule this out, as the subjects were not given any indication of the regularity of food intake in the fixed time window in the meetings with the study staff, i.e. they were not instructed to continue eating exactly as before. This would possibly have led to an effort on the part of the participants to adhere to a schedule as precisely as possible. As a result, there was a statistically significant worsening of the MTVS in the CG, which was less than 0.6 MTVS, i.e. a time span of only approx. ± 7.5 min, and remained within the MTVS 3. Since there were no correlations between the measured MTVS and the weight of the subjects in the CG and a change of about half an MTVS value has only a rather minor effect on weight, we do not attribute great significance to the observed deterioration in the MTVS.

      (2) There would be greater clarity for the readers if the authors clearly defined the study design in detail at the outset of the study, e.g., in section 2.1.

      We have included a brief summary of the study design at the end of the introduction so that the reader is already familiar with it at the beginning of the manuscript without having to switch to the material and methods section.

      (3) The data in Fig S2H is important and informs readers that the regularity of lunch and dinner is more related to body weight changes than breakfast. These data should be incorporated in the Main Figure. In addition, analyses of Table S7 data indicate that MTVS of no greater than 3 or -/+45mins of the meal-timing window is associated with efficient weight loss) should be represented in a figure panel in the Main Figures.

      As suggested by the reviewer, we have moved Fig. S2H to the main Fig. 1. In addition, Table S7 is now no longer inserted as a supplementary table but as main Table 1 in the manuscript.

      (4) The authors state in lines 222-223 that "weight changes of participants were not related to one of these changes in eating characteristics (Fig. 3B-D, Tab. S6)", referring to the shortening of feeding windows as noted in the EG group. This is a rather simplistic statement, which should be amended to include that weight changes may not relate to changes in eating characteristics per se but likely relate to changes in metabolic programming, for instance, energy expenditure increases, which have been shown to associate with these changes in eating characteristics. This is important to note.

      We have changed the wording at this point so that it is clear that we are only referring here in the results section to the results of the mathematical analysis, which showed no correlation between the eating time window and weight loss in our sample. However, we have now explicitly mentioned the change in metabolic programming correctly noted by the reviewer in the discussion at the end of section 3.

      (5) Please provide more background and details on the attributes that define individual participant chronotypes in the manuscript before discussing datasets, e.g., mSP and mEP. This is relation to narratives between 228-230: "Indeed, our data show that the later the chronotype of participants (measured by the MCTQ mid-sleep phase, mSP [24]), the later their mid-eat phase (mEP) on weekends (Fig. 3E, Tab. S6), with the mSP and mEP being almost antiphasic on average (Fig. 3F, Tab. S10)." This will help readers unfamiliar with circadian biology/chronobiology research understand the contents of this manuscript, particularly Fig 3.

      We have explained the new chronobiology terms that appear in the chapter better in the revised version so that they are easier to understand.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify Terminology: Define or avoid using ambiguous terms such as "caloric event" to prevent confusion, especially for readers less familiar with chronobiology. Consider providing clear explanations or opting for more widely understood terms.

      We have replaced "caloric event" with “calorie intake occasion” and explain various chronobiology terms better, so that hopefully readers from other disciplines can now follow the text more easily.

      (2) Detailed Methodological Descriptions: Improve the transparency of your methods, especially concerning the measurement of primary and secondary outcomes. Address the concerns raised about the reliability of self-reported weight and the potential biases in measurement methods.

      In the section "3.1 Limitations", we have examined the aspect of the reliability of self-reported data and our measures to reduce this uncertainty in more detail. We have also added further details on the measurement of outcomes in the materials and methods section.

      (3) Address Participant Selection Criteria: Reevaluate the inclusion criteria and consider discussing the implications on the study's findings of the broad age range, the inclusion of shift work, unmatched cohorts, and inclusion of individuals with normal weight, overweight, and obesity. Provide a subgroup analysis or discuss how BMI might have influenced the results. Even though this is an additional post-hoc analysis, it would directly address one of the major weaknesses of the study design.

      We have supplemented the analyses and now show in Fig. S2G that neither age nor gender had any influence on weight loss as a result of the intervention. To our knowledge, none of the 100 participants evaluated were shift workers. Even if shift workers were part of the study without our knowledge, we do not consider this to be a problem as long as their shifts allow them to keep to certain eating times. The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Our previous analysis in Fig. S2I already showed that there is a negative correlation between baseline BMI and weight loss - an interesting result, as it shows that people with a high BMI particularly benefit from the intervention. In addition, we already showed in Fig. S2J in a subgroup analysis that in all strata the BMI of EG subjects decreased more than that of CG subjects, even if they had the same initial BMI. We do not consider the wide dispersion of the BMIs of the included participants to be a weakness of the study design. On the contrary, it allows us to make a statement about which target group the intervention is particularly suitable for.

      (4) Improve Statistical Analysis: If not already done, involve a biostatistician to review the statistical analyses, particularly concerning post-hoc tests, correlation analyses, and the handling of measurement biases. Ensure that deviations from the original study protocol are clearly documented and justified.

      All analyses have already been checked by a statistician, decided together with him and approved by him.

      (5) Data Interpretation and Speculation: Limit speculation and clearly distinguish between findings supported by your data from hypotheses and future directions. Ensure that discussions about the implications of meal timing on metabolism are supported by evidence with adequate references and clearly state where further research is needed.

      We have revised the discussion and, especially through the detailed discussions of the limitations, we have emphasized more clearly what has been achieved and what still needs to be proven in future studies.

      (6) Clinical Trial Registration: Address the lack of registration in the EU Clinical Trials Register and clinicaltrials.gov. Discuss its potential implications on the study's transparency and how it aligns with current requirements and regulations.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE).[…] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place before it began and see no restriction in transparency or requirements and regulations.

      (7) Use of Sensitive and Current Terminology: Update the manuscript to reflect the latest recommendations regarding the language used to describe obesity and patients living with obesity. This ensures respect and accuracy in reporting and aligns with contemporary standards in the field.

      We updated the manuscript accordingly.

      (8) Strengthen the Introduction: Expand the literature review to include more recent and relevant studies that contextualise your work within the broader field of chrononutrition. This could help clarify how your study builds upon or diverges from existing research.

      We have included further studies in the introduction that aim to reduce body weight by restricting food intake to certain time periods. We have also more clearly contrasted the designs of these studies with the design of our study.

      (9) Clarify Discrepancies and Errors: Address any inconsistencies, such as the discrepancy in meal timing instructions (90 minutes reported in the conclusion vs. 60 minutes reported in the methods), and ensure all figures, tables, and statistical analyses are correctly referenced and described.

      The first point mentioned by the reviewer is not an inconsistency. To ensure the feasibility of the intervention, each participant was initially given a time window of +/- 30 minutes (60 min) from the specified eating time. Our later analyses show that even a time window of +/- 45 minutes (90 min) around the specified eating time is sufficient to lose weight efficiently (see results in Table 1).

      We have checked all references to figures, tables and statistical analyses and updated them if necessary.

      (10) Discuss Limitations and Bias: More thoroughly discuss the limitations of your study, including the potential impacts of biases and how they were mitigated. Additionally, consider the effects of including shift workers and how this choice impacts the applicability of your findings.

      Section “3.1 Limitations” has now been supplemented by a number of points and discussions. As described above, we do not consider the inclusion of shift workers to be a limitation as long as they are able to adhere to the specifications of the eating time plan. We cannot derive any indications to the contrary from our data.

      (11) Consider Publishing Separate Manuscripts: If the study encompasses a wide range of outcomes or post-hoc analyses, consider separating these into distinct publications to allow for a more focused and detailed exploration of each set of findings.

      We will take this advice into consideration for future publications on the continuation of the study. As this is a pilot study that is intended to clarify whether and to what extent the intervention is effective, we believe it makes sense to report all the data in a publication.

      (12) By addressing these recommendations, the authors can significantly improve their manuscript's clarity, reliability, and impact. This would not only support the dissemination of their findings but also would contribute valuable insights into the growing field of chrononutrition.

      We hope that we have satisfactorily answered, discussed and implemented the points mentioned by the reviewer in the manuscript, so that clarity, reliability, and impact have been increased and it can offer a valuable contribution to the named field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The report describes the control of the activity of the RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. Repressive binding of K3 to the kinase prevents phosphorylation of its recognised substrate, EIF2α (the α subunit of the Eukaryotic Initiation Factor 2). The interaction of K3 is probed by saturation mutation within four regions of PKR chosen by modelling the molecules' interaction. They identify K3-resistant PKR variants that recognise that the K3/EIF2α-binding surface of the kinase is malleable. This is reasonably interpreted as indicating the potential adaptability of this antiviral protein to combat viral virulence factors.

      Strengths:

      This is a well-conducted study that probes the versatility of the antiviral response to escape a viral inhibitor. The experimentation is very diligent, generating and screening a large number of variants to recognise the malleability of residues at the interface between PKR and K3.

      Weaknesses:

      (1) These are minor. The protein interaction between PKR and K3 has been previously well-explored through phylogenetic and functional analyses and molecular dynamics studies, as well as with more limited site-directed mutational studies using the same experimental assays.

      Accordingly, these findings largely reinforce what had been established rather than making major discoveries.

      First, thank you for your thoughtful feedback. We agree that our results are concordant with previous findings and recognize the importance of emphasizing what we find novel in our results. We have revised the introduction (lines 65-74 of the revised_manuscript.pdf) to emphasize three findings of interest: (1) the PKR kinase domain is largely pliable across its substrate-binding interface, a remarkable quality that is most fully revealed through a comprehensive screen, (2) we were able to differentiate variants that render PKR nonfunctional from those that are susceptible to Vaccinia K3, and (3) we observe a strong correlation between PKR variants that are resistant to K3 WT and K3-H47R.

      There are some presumptions:

      (2) It isn't established that the different PKR constructs are expressed equivalently so there is the contingency that this could account for some of the functional differences.

      This is an excellent point. We have revised the manuscript to raise this caveat in the discussion (lines 247-251). One indirect reason to suppose that expression differences among our PKR variants are not a dominant source of variation is that we did not observe much variation in kinase activity in the absence of K3.

      (3) Details about the confirmation of PKR used to model the interaction aren't given so it isn't clear how accurately the model captures the active kinase state. This is important for the interaction with K3/EIF2α.

      We have expanded on Supplemental Figure 12 and our description of the AlphaFold2 models in the Materials and Methods section (lines 573-590). We clarify that these models may not accurately capture the phosphoacceptor loop of eIF2α (residues Glu49-Lys60) and the PKR β4-5 linker (Asp338-Asn350) as these are highly flexible regions that are absent in the existing crystal structure complex (PDB 2A1A) and have low AlphaFold2 confidence scores (pLDDT < 50). We also noted, in the Materials and Methods section and in the caption of Figure 1, that the modeled eIF2α closely resembles the crystal structure of standalone yeast eIF2α, which places the Ser51 phosphoacceptor site far from the PKR active site. Thus, we expect there are additional undetermined PKR residues that contact eIF2α.

      (4) Not all regions identified to form the interface between PKR and K3 were assessed in the experimentation. It isn't clear why residues between positions 332-358 weren't examined, particularly as this would have made this report more complete than preceding studies of this protein interaction.

      Great questions. We designed and generated the PKR variant library based on the vaccinia K3 crystal structure (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A), in which PKR residues 338-350 are absent. After the genesis of the project, we generated the AlphaFold2-predicted complex of PKR and vaccinia K3, and have become very interested in the β4-β5 linker, a highly diverse region across PKR homologs which includes residues 332-358. However, this region remains unexamined in this manuscript.

      Reviewer #2 (Public Review):

      Chambers et al. (2024) present a systematic and unbiased approach to explore the evolutionary potential of the human antiviral protein kinase R (PKR) to evade inhibition by a poxviral antagonist while maintaining one of its essential functions.

      The authors generated a library of 426 single-nucleotide polymorphism (SNP)-accessible non-synonymous variants of PKR kinase domain and used a yeast-based heterologous virus-host system to assess PKR variants' ability to escape antagonism by the vaccinia virus pseudo-substrate inhibitor K3. The study identified determinant sites in the PKR kinase domain that harbor K3-resistant variants, as well as sites where variation leads to PKR loss of function. The authors found that multiple K3-resistant variants are readily available throughout the domain interface and are enriched at sites under positive selection. They further found some evidence of PKR resilience to viral antagonist diversification. These findings highlight the remarkable adaptability of PKR in response to viral antagonism by mimicry.

      Significance of the findings:

      The findings are important with implications for various fields, including evolutionary biology, virus-host interfaces, genetic conflicts, and antiviral immunity.

      Strength of the evidence:

      Convincing methodology using state-of-the-art mutational scanning approach in an elegant and simple setup to address important challenges in virus-host molecular conflicts and protein adaptations.

      Strengths:

      Systematic and Unbiased Approach:

      The study's comprehensive approach to generating and characterizing a large library of PKR variants provides valuable insights into the evolutionary landscape of the PKR kinase domain. By focusing on SNP-accessible variants, the authors ensure the relevance of their findings to naturally occurring mutations.

      Identification of Key Sites:

      The identification of specific sites in the PKR kinase domain that confer resistance or susceptibility to a poxvirus pseudosubstrate inhibition is a significant contribution.

      Evolutionary Implications:

      The authors performed meticulous comparative analyses throughout the study between the functional variants from their mutagenesis screen ("prospective") and the evolutionarily-relevant past adaptations ("retrospective").

      Experimental Design:

      The use of a yeast-based assay to simultaneously assess PKR capacity to induce cell growth arrest and susceptibility/resistance to various VACV K3 alleles is an efficient approach. The combination of this assay with high-throughput sequencing allows for the rapid characterization of a large number of PKR variants.

      Areas for Improvement:

      (5) Validation of the screen: The results would be strengthened by validating results from the screen on a handful of candidate PKR variants, either using a similar yeast heterologous assay, or - even more powerfully - in another experimental system assaying for similar function (cell translation arrest) or protein-protein interaction.

      Thank you for your thoughtful feedback. We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and found that the results generally support our original findings. We have revised the manuscript to include these validation experiments (lines 117-119 of the revised_manuscript.pdf, Supplemental Figure 4).

      (6) Evolutionary Data: Beyond residues under positive selection, the screen would allow the authors to also perform a comparative analysis with PKR residues under purifying selection. Because they are assessing one of the most conserved ancestral functions of PKR (i.e. cell translation arrest), it may also be of interest to discuss these highly conserved sites.

      This is a great point. We do find that there are regions of the PKR kinase domain that are not amenable to genetic perturbation, namely in the glycine rich loop and active site. We contrast the PKR functional scores at conserved residues under purifying selection with those under positive selection in Figure 2E (lines 141-143).

      (7) Mechanistic Insights: While the study identifies key sites and residues involved in vaccinia K3 resistance, it could benefit from further investigation into the underlying molecular mechanisms. The study's reliance on a single experimental approach, deep mutational scanning, may introduce biases and limit the scope of the findings. The authors may acknowledge these limitations in the Discussion.

      We agree that further investigation into the underlying molecular mechanisms is warranted and we have revised the manuscript to acknowledge this point in the discussion (lines 284-288).

      (8) Viral Diversity: The study focuses on the viral inhibitor K3 from vaccinia. Expanding the analysis to include other viral inhibitors, or exploring the effects of PKR variants on a range of viruses would strengthen and expand the study's conclusions. Would the identified VACV K3-resistant variants also be effective against other viral inhibitors (from pox or other viruses)? or in the context of infection with different viruses? Without such evidence, the authors may check the manuscript is specific about the conclusions.

      This is a fantastic question that we are interested in exploring in our future studies. In the manuscript we note a strong correlation between PKR variants that evade vaccinia wild-type K3 and the K3-H47R enhanced allele, but we are curious to know if this holds when tested against other K3 orthologs such as variola virus C3. That said, we have revised the manuscript to clarify this limitation to our findings and specify vaccinia K3 where appropriate.

      Reviewer #3 (Public Review):

      Summary:

      -  This study investigated how genetic variation in the human protein PKR can enable sensitivity or resistance to a viral inhibitor from the vaccinia virus called K3.

      -  The authors generated a collection of PKR mutants and characterized their activity in a high-throughput yeast assay to identify 1) which mutations alter PKR's intrinsic biochemical activity, 2) which mutations allow for PKR to escape from viral K3, and 3) which mutations allow for escape from a mutant version of K3 that was previously known to inhibit PKR more efficiently.

      -  As a result of this work, the authors generated a detailed map of residues at the PKR-K3 binding surface and the functional impacts of single mutation changes at these sites.

      Strengths:

      -  Experiments assessed each PKR variant against three different alleles of the K3 antagonist, allowing for a combinatorial view of how each PKR mutant performs in different settings.

      -  Nice development of a useful, high-throughput yeast assay to assess PKR activity, with highly detailed methods to facilitate open science and reproducibility.

      -  The authors generated a very clean, high-quality, and well-replicated dataset.

      Weaknesses:

      (9) The authors chose to focus solely on testing residues in or near the PKR-K3 predicted binding interface. As a result, there was only a moderately complex library of PKR mutants tested. The residues selected for investigation were logical, but this limited the potential for observing allosteric interactions or other less-expected results.

      First, we greatly appreciate all your feedback on the manuscript, as well as raising this particular point. We agree that this is a moderately complex library of PKR variants, from which we begin to uncover a highly pliable domain with a few specific sites that cannot be altered. We have revised the manuscript to raise this limitation (lines 284-288 of the revised_manuscript.pdf) and encourage additional exploration of the PKR kinase domain.

      (10) For residues of interest, some kind of independent validation assay would have been useful to demonstrate that this yeast fitness-based assay is a reliable and quantitative readout of PKR activity.

      We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and generally found that the results support our original findings. We have revised the manuscript to include this validation experiment (lines 117-119, Supplemental Figure 4).

      (11) As written, the current version of the manuscript could use more context to help a general reader understand 1) what was previously known about these PKR and K3 variants, 2) what was known about how other genes involved in arms races evolve, or 3) what predictions or goals the authors had at the beginning of their experiment. As a result, this paper mostly provides a detailed catalog of variants and their effects. This will be a useful reference for those carrying out detailed, biochemical studies of PKR or K3, but any broader lessons are limited.

      Thank you for bringing this to our attention. We have revised the introduction of the manuscript to provide more context regarding previous work demonstrating an evolutionary arms race between PKR and K3 and how single residue changes alter K3 resistance (lines 51-64).

      (12) I felt there was a missed opportunity to connect the study's findings to outside evolutionary genetic information, beyond asking if there was overlap with PKR sites that a single previous study had identified as positively selected. For example, are there any signals of balancing selection for PKR? How much allelic diversity is there within humans, and are people typically heterozygous for PKR variants? Relatedly, although PKR variants were tested in isolation here, would the authors expect their functional impacts to be recessive or dominant, and would this alter their interpretations? On the viral diversity side, how much variation is there among K3 sequences? Is there an elevated evolutionary rate, for example, in K3 at residues that contact PKR sites that can confer resistance? None of these additions are essential, but some kind of discussion or analysis like this would help to connect the yeast-based PKR phenotypic assay presented here back to the real-world context for these genes.

      We appreciate this suggestion to extend our findings to a broader evolutionary context. There is little allelic diversity of PKR in humans, with all nonsynonymous variation listed in gnomAD being rare. (PKR shows sequence diversity in comparisons across species, including across primates.) Thus, barring the possibility of variation being present in under-studied populations, there is unlikely to be balancing selection on PKR in humans. Our expectation is that beneficial mutations in PKR for evading a pseudosubstrate inhibitor would be dominant, as a small amount of eIF2α phosphorylation is capable of halting translation (Siekierka, PNAS, 1984). There is a recent report citing PKR missense variants associated with dystonia that can be dominantly or recessively inherited (Eemy et al. 2020 PMID 33236446). Elde et al. 2009 (PMID 19043403) notes that poxvirus K3 homologs are under positive selection but no specific residues have been cited to be under positive selection. The lack of allelic diversity in PKR in humans notwithstanding, PKR could experience future selection in the human population as evidenced by its rapid evolution in primates, so we fully agree that a connection to the real-world context is useful. We have noted these topics in the discussion section (lines 289-294).

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticisms but ask for some clarifications and make some comments about the perceived weaknesses.

      (13)  If the authors disagree with my summation that the findings largely replicate what was known, could they detail how the findings differ from what was known about this protein interaction and the major new insights stemming from the study? Currently, the abstract is a little philosophical rather than listing the explicit discoveries of the study.

      Thank you again for raising the need for us to clearly convey the novelty of our findings. We have revised the final paragraph in our introduction as described in comment #1.

      (14) As the experimental approach is well reported it is unnecessary to confirm the proposed activity by, for instance, measures of Sui2 phosphorylation. However, previous reports have recognised that point mutants of PKR can be differentially expressed. The impact of this potential effect is unknown in the current experimentation as there are no measures of the expression of the different mutant PKR constructs. The large number of constructs used makes this verification onerous. The potential impact could be ameliorated by redundant replacing each residue (hoping different residues have different effects on expression). Still, this limitation of the study should be acknowledged in the text.

      We greatly appreciate this comment and agree that this should be made clear in the text, which we have added to the discussion of the manuscript (lines 247-251).

      (15) Preceding findings and the modeling in this report recognise an involvement in the kinase insert region (residues 332 to 358) in PKR's interaction with K3 but this region is excluded from the analysis. These residues have been largely disregarded in the preceding analysis (it is absent from the molecular structure of the kinase) so its inclusion here might have lent a more novel aspect or delivered a more complete investigation. Is there a justification for excluding this flexible loop?

      The PKR variant library was designed based on the crystal structure of K3 (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A). After the library was designed and made we attained complete predicted structures of PKR in complex with eIF2α and K3, which largely agrees with the predicted crystal structures but contain the additional flexible loops that were not captured in the crystal structures. Though the library studied here does not explore variation in the kinase insert region, we are very interested in doing so in our future studies.

      (16)  Could the explanation of the 'PKR functional score' be clarified? The description given within the legend of SF1 was helpful, so could this be replicated earlier in the main body of the text when introducing these experiments? e.g. As PKR activity is toxic to yeast, the number of cells in the pool expressing the functional PKR will decrease over time. Thus the associated barcode read count will also decrease, while the read count for the nonfunctional PKR will increase. This is termed the PKR function score, which will be relatively lower for cells transformed with less active PKR than those with more active PKR.

      Thank you for suggesting this clarification, we have revised the manuscript to clarify our definition of the PKR functional score (lines 106-109).

      (17)  Another suggestion to clarify this term is to modify the figures. Currently, the intent of the first simulated graph in Fig 1E is clear but the inversion of the response (shown by the transposition of the colours) in the next graph (to the right) is less immediately obvious. Accordingly, the orientation of the 'PKR functional score' is uncertain. Could the authors add text to the rightmost graphic in Figure 1E by, for instance, indicating the PKR activity in the vertical column with text such as 'less active' (at the bottom), 'WT' (in the centre), and 'more activity' (at the top)? Also, the position of the inactive K296R mutant might be added to Figure 2A complementing the positioning of the active WT kinase in the first data graph of this kind.

      We appreciate your specific feedback to improve the figures of the manuscript, we have made adjustments to Figure 1E to clarify how we derive the PKR functional scores.

      (18) The authors don't use existing structures of PKR in their modelling. However, there is no information about the state of the PKR molecule used for modelling. Specific elements of the kinase domain affect its interaction with K3 so it would be informative to know the orientation of these elements in the model. Could the authors detail the state of pivotal kinase elements in their models? This could involve the alignment of the N- and C-lobes, the orientation of kinase spines (C- and R-spines), and the phosphorylation stasis of residues in the activation loop, or at least the position of this loop in relationship to that adopted in the active dimeric kinase (e.g. PDB-2A1A, 3UIU or 6D3L). Alternatively, crystallographic structures of active inactive PKR could be overlayed with the theoretical structure used for modelling (as supplementary information).

      We have revised the manuscript to describe the alignment of the predicted PKR-K3 complex with active and inactive PKR, and we have extended Supplemental Figure 12 with an overlay of the predicted structures with existing structures. We have also added a supplemental data file containing the RMSD values of PKR (from the predicted PKR-K3 complex) aligned to active (PDB 2A1A) and inactive (PDB 3UIU) or unphosphorylated (PDB 6D3L) PKR (5_Structure-Alignment-RMSD-Values.xlsx). We have also provided the AlphaFold2 best model predictions for the PKR-eIF2α complex (6_AF2_PKR-KD_eIF2a.pdb) and PKR-K3 complex (7_AF2_PKR-KD_VACV-K3.pdb). Looking across the RMSD values, the AlphaFold2 model of PKR most closely resembles unphosphorylated PKR (PDB 6D3L) though we note the activation loop is absent from PDB 6D3L and 3UIU. We also aligned the Ser51 phosphoacceptor loop of AlphaFold2 eIF2α model to PDB 1Q46 and we see that the model reflects the pre-phosphorylation state. This loop is expected to interact with the PKR active site, which is not captured in our model and we state this explicitly in the caption of Figure 1 (lines 665-668).

      (19) Could some specific residue in Figure 7 be labelled (numbered) to orient the findings? Also, the key in this figure doesn't title the residues coloured white (RE red/black/blue). The white also isn't distinguished from the green (outside the regions targeted for mutagenesis).

      Excellent suggestion, we have revised this figure to include labels for the sites to orient the reader and clarify our categorization of PKR residues in the kinase domain.

      (20)  Regarding the discussion, the authors adopt the convention of describing K3 as a pseudosubstrate. Although I realize it is common to refer to K3 as a pseudosubstrate, it isn't phosphorylated and binds slightly differently to PKR so alternative descriptors, such as 'a competitive binder', would more accurately present the protein's function. Possibly for this reason, the authors declared an expectation that evolution pressures should shift K3 to precisely mimic EIF2α. However, closer molecular mimicry shouldn't be expected for two reasons. The first is a risk of disrupting other interactions, such as the EIF2 complex. Secondly, equivalent binding to PKR would demote K3 to merely a stoichiometric competitor of EIF2α. In this instance, effective inhibition would require very high levels of K3 to compete with equivalent binding by EIF2α. This would be demanding particularly upon induction of PKR during the interferon response. To be an effective inhibitor K3 has to bind more avidly than EIF2α and merely requires a sufficient overlap with the EIF2α interface on PKR to disrupt this alternative association. This interpretation predicts that K3 is under pressure to bind PKR by a different mechanism than EIF2α.

      We appreciate your thoughtful point about the usage of the term pseudosubstrate. Ultimately, we’ve decided to continue using the term due to its historical usage in the field. The question of the optimal extent of mimicry in K3 is a fascinating one, and we greatly appreciate your thoughts. We wholly agree that the possibility of K3 having superior PKR binding relative to eIF2α would be preferable to perfect mimicry. In our Ideas and Speculation section, we propose that benefits towards increasing PKR affinity may need to be balanced against potential loss of host range resulting from overfitting to a given host’s PKR. However, the possibility that reduced mimicry could be selected to avoid disruption of eIF2 function had not occurred to us; thank you for pointing it out!

      (21) The discussion of the 'positive selection' of sites is also interesting in this context. To what extent has the proposed positive selection been quantified? My understanding is that all of the EIF2α kinases are conserved and so demonstrate lower levels of residue change that might be expected by random mutagenesis i.e. variance is under negative selection. The relatively higher rate of variance in PKR orthologs compared to other EIF2α kinases could reflect some relaxation of these constraints, rather than positive selection. Greater tolerance of change may stem from PKR 's more sporadic function in the immune response (infrequent and intermittent presence of its activating stimuli) rather than the ceaseless control of homeostasis by the other EIF2α kinases. Also, induction of PKR during the immune response might compensate for mutations that reduce its activity. I believe that the entire clade of extant poxviruses is young relative to the divergence between their hosts. Accordingly, genetic variance in PKR predates these viruses. Although a change in PKR may become fixed if it affords an advantage during infection, such an advantage to the host would be countered by the much higher mutation rates of the virus. This would appear to diminish the opportunity for a specific mutation to dominate a host population and, thereby, to differentiate host species. Rather, pressure to elude control by a rapidly evolving viral factor would favour variation at sites where K3 binds. This speculation offers an alternative perspective to the current discussion that the variance in PKR orthologs stems from positive selection driven by viral infection.

      We appreciate this stimulating feedback for discussion. Three of the four eIF2α kinases (HIR, PERK, and GCN2) appear to be under purifying selection (Elde et al. 2009, PMID 19043403), which stand in contrast to PKR. Residues under positive selection have been found throughout PKR, including the dsRNA binding domains, linker region, and the kinase domain. Importantly, the selection analysis from Elde et al. and Rothenburg et al. concluded that positive selection at these sites is more likely than relaxed selection. We agree that poxviruses are young, though we would guess that viral pseudosubstrate inhibition of PKR is ancient. Many viral proteins have been reported to directly interact with PKR, including herpes virus US11, influenza A virus NS1A, hepatitis C virus NS5A, and human immunodeficiency virus Tat. The PKR kinase domain does contain residues under purifying selection that are conserved among all four eIF2α kinases, but it also contains residues under positive selection that interface with the natural substrate eIF2α. Our work suggests that PKR is genetically pliable across several sites in the kinase domain, and we are curious to know if this pliability would hold at the same sites across the other three eIF2α kinases.

      (22) The manuscript is very well written but has a small number of typos; e.g. an aberrant 'e' ln 7 of the introduction, capitalise the R in ranavirus on the last line of the fourth paragraph of the discussion, and eIF2α (EIF2α?) is occasionally written as eIFα in the materials&methods.

      Thank you for bringing these typos to our attention! We’ve deleted the aberrant ‘e’ in the introduction, capitalized ‘Ranavirus’ in the discussion (line 265), and corrected ‘eIFα’ to ‘eIF2α’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional minor edits or revisions:

      (23) Paragraph 3 of the Introduction gives the impression that most of the previous work on the PKR-virus arms race is speculative. However, it is one of the best-described and most convincing examples of virus-host arms races. Can the authors edit the paragraph accordingly?

      Thank you for bringing this to our attention. We have revised the third paragraph and strengthened the description of the evolutionary arms race between PKR and viral pseudosubstrate antagonists.

      (24) Introduction: PKR has "two" double-stranded RNA binding domains. Can the authors update the text accordingly?

      We have updated the manuscript to clarify PKR has two dsRNA binding domains (lines 44-45).

      (25) The authors test here for one of the key functions of PKR: cell growth/translation arrest. Because of PKR pleiotropy, the manuscript may be edited accordingly: For example, statements such as "We found few genetic variants render the PKR kinase domain nonfunctional" are too speculative as they may retain other (not tested here) functions.

      This is a great suggestion, we have revised the manuscript to specify our definition of nonfunction in the context of our experimental screen (lines 86-92 and 106-109) and acknowledge this limitation in our experimental screen (lines 304-307).

      (26) The authors should specify "vaccinia" K3 whenever appropriate.

      We appreciate this comment and have revised the manuscript to specify vaccinia K3 where appropriate (e.g. lines 62,66, 70, 80, 108, and 226).

      (27) Ref for ACE2 diversification may include Frank et al 2022 PMID: 35892217.

      Thank you for pointing us to this paper, we have included it as a reference in the manuscript (line 277).

      (28) Positive selection of PKR as referred to by the authors corresponds to analyses performed in primates. As shown by several studies, the sites under positive selection may vary according to host orders. Can the authors specify this ("primate") in their manuscript? And/or shortly discuss this aspect.

      Thank you for raising this point. In the manuscript we performed our analysis using vertebrate sites under positive selection as identified in Rothenburg et al. 2009 PMID 19043413 (lines 51 and figure legends). We performed the same analysis using sites under positive selection in primates (as identified by Elde et al. 2009 PMID 19043403) and again found a significant difference in PKR functional scores versus K3. We have revised the manuscript to clarify our use of vertebrate sites under positive selection (line 80-81).

      (29) We view deep mutational scanning experiments as a complementary approach to positive selection": The authors should edit this and acknowledge previous and similar work of other antiviral factors, in particular one of the first studies of this kind on MxA (Colon-Thillet et al 2019 PMID: 31574080), and TRIM5 (Tenthorey et al 2020 PMID: 32930662).

      Thank you for raising up these two papers, which we acknowledge in the revised manuscript (line 299).

      (30) We believe Figure S7 brings important results and should be placed in the Main.

      We appreciate this suggestion, and have moved the contents of the former supplementary Figure 7 to the main text, in Figure 6.

      (31) The title may specify "poxvirus".

      Thank you for the suggestion to specify the nature of our experiment, we have adjusted the title to: Systematic genetic characterization of the human PKR kinase domain highlights its functional malleability to escape a poxvirus substrate mimic (line 3).

      Reviewer #3 (Recommendations For The Authors):

      (32) No line numbers or page numbers are provided, which makes it difficult to comment.

      We sincerely apologize for this oversight and have included line numbers in our revised manuscript as well as the tracked changes document.

      (33) In the introduction, I recommend defining evolutionary arms races more clearly for a broad audience.

      Thank you for this suggestion. We have revised the manuscript in the first and third paragraphs to more clearly introduce readers to the concept of an evolutionary arms race.

      (34) The introduction could use a clearer statement of the question being considered and the gap in knowledge this paper is trying to address. Currently, the third paragraph includes many facts about PKR and the fourth paragraph jumps straight into the approach and results. Some elaboration here would convey the significance of the study more clearly. As is, the introduction reads a bit like "We wanted to do deep mutational scanning. PKR seemed like an ok protein to look at", rather than conveying a scientific question.

      This is a great suggestion to improve the introduction section. We have heavily revised the third and fourth paragraphs of the introduction to clarify the motivation, approach, and significance of our work.

      (35) Relatedly, did the authors have any hypotheses at the start of the experiment about what kinds of results they expected? e.g. What parts of PKR would be most likely to generate escape mutants? Would resistant mutants be rare or common? etc? This would help the reader to understand which results are expected vs. surprising.

      These are all great questions. We have revised the introduction of the manuscript to point out that previous studies have characterized a handful of PKR variants that evade vaccinia K3, and these variants were made at sites found to be under positive selection (lines 60-64).

      (36) A description of the different K3 variants and information about why they were chosen for study should also be added to the Introduction. It was not until Figure 5 that the reader was told that K3-H47R was the same as the 'enhanced' K3 allele you are testing.

      Thank you for bringing this to our attention, we have revised the introduction to clarify the experimental conditions (lines 65-67) and specify K3-H47R as the enhanced allele earlier in the manuscript (line 100).

      (37) Does every PKR include just a single point mutation? It would be nice to see data about the number and types of mutations in each PRK window added to Supplemental Figure 1.

      Thank you for the suggestion to improve this figure. Every PKR variant that we track has a single point mutation that generates a nonsynonymous mutation. In our PacBio sequencing of the PKR variant library we identified a few off-target variants or sequences with multiple variants, but we identified the barcodes linked to those constructs and discarded those variants in our analysis. We have revised Supplemental Figure 1 to include the number and types of mutations made at each PKR window.

      (38) In terms of the paper's logical flow, personally, I would expect to begin by testing which variants break PKR's function (Figure 3) and then proceeding to see which variants allow for K3 escape (Figure 2). Consider swapping the order of these sections.

      Thank you for this suggestion, and we can appreciate how the flow of the manuscript may be improved by swapping Figures 2 and 3. We have decided to maintain the current order of the figures because we use Figure 3 to emphasize the distinction of PKR sites that are nonfunctional versus susceptible to vaccinia K3.

      (39) Figure 3A seems like a less-informative version of Figure 4A, recommend combining these two. Same comment with Figure 5A and Figure 6A.

      We appreciate this specific feedback for the figures. Though there are similarities between figure panels (e.g. 3A and 4A) we use them to emphasize different points in each figure. For example, in Figure 3 we emphasize the general lack of variants that impair PKR kinase activity, and in Figure 4 we distinguish kinase-impaired variants from K3-susceptible variants. For this reason, and given space constraints, we have chosen to maintain the figures separately. We did decide to move the former Figure 6 to the supplement.

      (40) In general, it felt like there was a lot of repetition/re-graphing of the same data in Figures 3-6. I recommend condensing some of this, and/or moving some of the panels to supplemental figures.

      Thank you for your suggestion, we have revised the manuscript and have moved Figure 6 to Supplemental Figure 7.

      (41) In contrast, Supplemental Figure 7 is helpful for understanding the distribution of the data. Recommend moving to the main text.

      This is a great recommendation, and we have moved Supplemental Figure 7 into Figure 6.

      (42) How do the authors interpret an enrichment of positively selected sites in K3-resistant variants, but not K3-H74R-resistant variants? This seems important. Please explain.

      Thank you for this suggestion to improve the manuscript; we agree that this observation warranted further exploration. We found a strong correlation in PKR functional scores between K3 WT and K3-H47R, and with that we find sites under positive selection that are resistant to K3 WT are also resistant to K3-H47R. The lack of enrichment at positively selected sites appears to be caused by collapsed dynamic range between PKR wild-type-like and nonfunctional variants in the K3-H47R screen. We have revised the manuscript to clarify this point (line 202-204).

      (43) Discussion: The authors compare and contrast between PKR and ACE2, but it would be worth mentioning other examples of genes involved in antiviral arms races wherein flexible, unstructured loops are functionally important and are hotspots of positive selection (e.g. MxA, NLRP1, etc).

      We greatly appreciate this suggestion to improve the discussion. We note this contrast between the PKR kinase domain and the flexible linkers of MxA and NLRP1 in the revised manuscript (lines 273-274).

      (44) Speculation section: What is the host range of the vaccinia virus? Is it likely to be a generalist amongst many species' PKRs (and if so, how variable are those PKRs)? Would be worth mentioning for context if you want to discuss this topic.

      Thank you for raising this question. Vaccinia virus is the most well studied of the poxviruses, having been used as a vaccine to eradicate smallpox, and serves as a model poxvirus. Vaccinia virus has a broad host range, and though the name vaccinia derives from the Latin word “vacca” for cow the viruses origin remains uncertain (Smith 2007 https://doi.org/10.1007/978-3-7643-7557-7_1). has been used to eradicate smallpox as a vaccine and serves as a model poxvirus. Thought the natural host is unknown, it appears to be a general inhibitor of vertebrate PKRs The natural host of vaccinia virus is unknown, though there is some evidence to suggest it may be native to rabbits and does appear to be generalist.

      (45) Many papers in this field discuss interactions between PKR and K3L, rather than K3. I understand that this is a gene vs. protein nomenclature issue, but consider matching the K3L literature to make this paper easier to find.

      Thank you for bringing this to our attention. We have revised the manuscript to specify that vaccinia K3 is expressed from the K3L gene in both the abstract (line 26) and the introduction (line 56) to help make this paper easier to find when searching for “K3L” literature.

      (46) Which PKR sequence was used as the wild-type background?

      This is a great question. We used the predominant allele circulating in the human population represented by Genbank m85294.1:31-1686. We cite this sequence in the Methods (line 421) and have added it to the results section as well (lines 84).

      (47) Figure 1C: the black dashed line is difficult to see. Recommend changing the colors in 1A-1C.

      Thank you for this suggestion, we have changed the dashed lines from black to white to make them more distinguishable.

      (48) Figure 1D: Part of the point of this figure is to convey overlaps between sites under selection, K3 contact sites, and eIF2alpha contact sites, but at this scale, many of the triangles overlap. It is therefore impossible to tell if the same sites are contacted vs. nearby sites. Perhaps the zoomed-in panels showing each of the four windows in the subsequent figures are sufficient?

      Thank you for bringing this to our attention. We have scaled the triangles down to reduce their overlap in Figure 1D and list all sites of interest (predicted eIF2α and vaccinia contacts, conserved sites, and positive selection sites) in the Materials and Methods section “Predicted PKR complexes and substrate contacts”.

      (49) Figure 1E: under "1,293 Unique Combinations", there is a line between the PKR and K3 variants, which makes it look like they are expressed as a fusion protein. I believe these proteins were expressed from the same plasmid, but not as a fusion, so I recommend re-drawing. Then in the graph, the y-axis says "PKR abundance", but from the figure, it is not clear that this refers to relative abundance in a yeast pool. Perhaps "yeast growth" or similar would be clearer?

      Thank you for the specific feedback to improve Figure 1. We have made the suggested edits to clarify that PKR and vaccinia K3 are not fused but each is expressed from their own promoter. We have also changed the y-axis from “PKR Abundance” to “Yeast Growth”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Correct capitalization errors, ensuring the first letter of each sentence is capitalized.

      Thank you for your comment. We have corrected capitalization errors.

      (2) Ensure that all technical terms and abbreviations are introduced in full when first mentioned and consistently used throughout the text.

      Thank you for your comment. we have checked and corrected the issue.

      (3) Review the manuscript for grammatical errors and improve sentence structures to enhance readability.

      Thank you for your comment. we have checked and corrected the issue.

      (4) Ensure all figures referenced in the text, such as Fig. 3G, are appropriately discussed and integrated into the narrative.

      Thank you for your comment. we have discussed and integrated Fig. 3G into the narrative (Page 12, Line 162-166).

      (5) Maintain consistent formatting, including first-line indentation and spacing before paragraphs, to improve the document's visual coherence.

      Thank you for your comment. we have checked and corrected the issue.

      (6) Provide additional explanations for the selection criteria of final model variables, particularly the rationale behind choosing the λ_1se criterion in the LASSO regression.

      Thank you for your comment. we have provided explanations for choosing the λ_1se criterion in the LASSO regression (Page 25, Line 315-316; Page 27, Line 363-364).

      (7) Conduct validation studies with cohorts from other high-altitude regions to assess the generalizability and robustness of the prediction models.

      Thank you for your comment. The lack of validation of cohorts from other high-altitude regions is a weakness in this study, and in our follow-up study, we will conduct external validation with cohorts from more other high-altitude regions to assess the generalizability and robustness of our prediction models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Bockorny, Muthuswamy, and Huang et al. performed proteomics analysis of plasma extracellular vesicles (EVs) from pancreatic ductal adenocarcinoma (PDAC) patients and patients with benign pancreatic diseases (chronic pancreatitis and intraductal papillary mucinous neoplasm, IPMN) to develop a 7-EV protein signature that predicts PDAC. Moreover, the authors identified PSMB4, RUVBL2, and ANKAR as being associated with metastasis. These studies provide important insight into alterations of EVs during PDAC progression and the data supporting predict PDAC with EV protein signatures are solid. However, there are certain concerns regarding the rigor and novelty of the data analysis and interpretation, as well as the clinical implications, as detailed below.

      (1) Plasma EVs were characterized by transmission electron microscopy and nanoparticle tracking analysis to confirm their morphology and size. The authors should also include an analysis of putative EV markers (e.g., tetraspanins, syntenin, ALIX, etc.) to confirm that the analyzed particles are EVs.

      We thank the reviewer for this comment. In the previous study from our co-authors who developed EVtrap method (PMID:32396726), they used electron microscopy and NTA , as well as quantification of typical EV protein markers, such as CD9, to confirm that particles isolated using EVtrap had typical characteristics of the extracellular vesicles. As such, these experiments were not replicated here. We added the following statement to the manuscript:

      “Previous analyses using electron microscopy and nanoparticle tracking also confirmed that the vast majority of particles isolated by EVtrap had diameters between 100-200 nm, consistent with exosomes (PMID:32396726). In addition, EVtrap isolates demonstrates higher abundance of CD9, a common exosome marker, as compared to isolates from other traditional EV isolation methods such as size exclusion chromatography and ultracentrifugation (PMID:32396726)”

      (2) The authors identified multiple over-expressed proteins in PDAC based on their foldchange and p-value; however, due to the heterogeneity of PDAC, it is necessary to show a heatmap displaying their abundance in all samples. High fold change does not necessarily indicate consistently high abundance in all PDAC samples.

      We thank the reviewer for this suggestion. We have now included the heatmap in the new Supplementary Figure 3.

      (3) PSMB4, RUVBL2, and ANKAR were identified as being associated with metastasis. The authors state that they intended to distinguish early and late-stage cancer samples, but it is unclear why they chose to compare metastatic and non-metastatic samples, as the non-metastatic group also includes late-stage cancer samples. This sentence should be rephrased to more accurately reflect the sample types profiled.

      We thank the reviewer for pointing this out. We would like to clarify that this analyses shown in Figures 3B and 3C pertain to patients with Metastatic vs Non-Metastatic disease, not early versus late stage. We edited the text to ensure this information is clear.

      (4) Non-metastatic and metastatic patients were separated based on global protein abundance. The samples within each group display significant heterogeneity, with some samples displaying similar patterns although they were classified into different groups (Figure 3A), and the samples within the same group, particularly the metastasis group, did not consistently exhibit similar patterns of protein abundance. The authors should clarify this point.

      We thank the reviewer for this comment. The EV proteomic expression is anticipated not to show the exact pattern across of samples of each group. The purpose of this experiment depicted in Figure 3 heatmap is to show the enrichment for pattern of expressions, but we acknowledge that not all samples from the same group have the exact proteome pattern.

      We added this statement in the discussion section:

      “As expected, the EV proteomic profiles of PDAC patients exhibited significant heterogeneity. While the above mentioned markers exhibited strong association with disease states at population levels, their abundances in individual patients varied significantly. Those observations highlight the need to develop multi-protein panels for pancreatic cancer diagnosis and prognosis.”

      (5) The authors performed the survival analysis on a set of EV proteins but did not specify the origin of these markers or how many markers were examined. The authors should show their abundances across different groups, such as different stages and metastasis status.

      We thank the reviewer for the comments. The goal of this experiment was not to identify EV proteins that performed similarly well for diagnosing and prognostication. In Figure 3A, 3B and 3C, we identified EV proteins that had better performance for diagnosis of metastatic disease. In these experiments we made  comparative analysis between patients with metastasis versus non-metastasis. In the experiment depicted in Figure 3D, the goal was to identify EV markers that had better performance is prognosticating outcomes as measured by overall survival, out of the markers identified in the previous experiments from Figure 3A. We would like to further clarify that based on our observation and others, it has become clear that EV profiles from cancer patients are highly heterogenous and we do not anticipate that a single marker will have sufficient test performance for cancer diagnosis or prognosis assessment when measured isolated. Rather, we anticipate that a panel of markers may yield better performance for diagnosis while a different combination of EV markers may have better performance for prognosis assessment.

      (6) The classification model yielded a 100% accuracy, which may refer to AUC, in their discovery cohort, but it decreased to 89% in the independent cohort. This suggests that the authors have encountered overfitting issues with their model, where it performed well on the discovery cohort but did not generalize well to the independent cohort. The authors should clarify this point. The AUC score of the 7-EV signature is 0.89 and is not equivalent to prediction accuracy. In order to demonstrate prediction accuracy, the authors should show the confusion matrix of training and testing data as well as other evaluation metrics, such as accuracy, precision, and recall.

      We thank the reviewer for providing these insightful comments. As you noted, the 7-biomarker signature machine learning model attained an impressive 100% accuracy within the internal Discovery Cohort, raising concerns about potential overfitting in the external validation dataset. Acknowledging the noted difference in AUROC of 0.11 in the external validation cohort, which surpasses the typical reported range of ~0.06-0.09, the model demonstrated a commendable AUROC of 0.89 in an independent patient cohort. Moreover, the utilization of an alternate technology to measure protein abundance in the validation dataset, underscores the model’s reproducibility and validity. We have provided the model metrics for both internal- and external-validation cohort. For these, please see updated Supplementary Figure 7, as well as the new Supplementary Figure 6 and Supplementary Figure 8. We also amended the discussion section to acknowledge that the validation cohort had limited sample size and proteins were measured in using a different method. Those factors likely contributed to the lower accuracy of predictions in the validation cohort. We addressed these limitations in the discussion section of the manuscript.

      (7) The authors should include more details of their model and the process of selection of signatures to enhance the reproducibility and transparency of their methods.

      We thank the reviewer for their valuable comments. To enhance clarity, we have incorporated additional information regarding the method employed for biomarker signature identification into the ‘Methods Section’ in page 23.  We note that Supplementary Table 7a provides details on ‘Sensitivity, Specificity, Precision, and AUC’ for the 16 markers included in the external validation study. Additionally, Supplementary Table 7b presents the contingency table for 7-biomarker signature, offering insights into model accuracy for both the Internal-Discovery and External Validation cohorts.  

      Reviewer #2 (Public Review):

      The authors intended to identify a protein signature in extracellular vesicles of serum to distinguish pancreatic ductal adenocarcinoma from benign pancreatic diseases.

      A major strength of the work presented is the valuable profiling of a significant number of patient samples, with a rich cohort of patients with pancreatic cancer, benign pancreatic diseases, and healthy controls. However, despite the strong cohorts presented, the numbers of patient samples for benign pancreatic diseases as well as controls were very limited.

      Also, the method used to isolate vesicles, EVTrap, recognizes double bilayers, which means that it can detect cellular debris and apoptotic bodies, which are very common in the circulation of patients that are undergoing chemotherapy. It would be important to identify the patients that are therapy naïve and the ones that are not because of this possible bias.

      We thank the Reviewer for these comments. We want to point out that the experiments presented in Supplementary Figure 1 (Transmission electron microscopy images and Nanoparticle tracking analysis) confirm that the vesicles isolated with EVTrap are not cellular debris and apoptotic bodies. Rather, these structures are in the nano range expected for exosomes. This is further supported by the additional work from our co-author and collaborator describing the development of EVtrap and its performance in isolating exosomes when compared to other traditional methods such as ultracentrifugation and size exclusion chromatography (PMID:32396726).

      As per the Reviewer’s request, we have provided an additional heatmap figure depicting whose patients are treatment naïve to differentiate from those who have received treatment (revised Figure 2C).

      Additionally, the transmission electron microscopy data reflect this heterogeneity of the samples, also with little identification of double bilayered vesicles. It would be important to identify some extracellular vesicles markers in those preparations to strengthen the quality of the samples analyzed.

      We appreciate the comment from the Reviewer and acknowledge the importance of identifying exosome markers on the isolate from EVtrap. These experiments have already been done and are reported in the original paper describing the development of this method by our co-authors in a separate work. In the manuscript PMID: 30080416, our collaborators demonstrated the detection of CD9, a well-known exosome marker, using Western Blot from isolates using EVtrap or ultra-centrifugation, a traditional technique to isolate exosomes. This work showed that EVtrap yielded much higher recovery rate of exosomes with lower contamination from soluble proteins. We did not repeat these already published experiments, but we amended our manuscript to reference these results.

      What is more, previously published work with this same methodology identifies around 2000 proteins per sample. It would be important to explain why in this study there seems to be a reduction in more than 50% of the amount of proteins identified in the vesicles.

      We thank the Reviewer for pointing out this important detail. In the previous work in which EVtrap was developed by our co-authors, the blood samples were processed using a different protocol, with shorter centrifugation (2,500g for 10 min) (PMID: 32396726). In the current work, we employed three centrifugation steps. As detailed in the Methods section of the manuscript, blood samples were centrifuged at 1,300g for 15 min. Then  plasma was removed from the top carefully avoiding cell pellet;  Repeat centrifugation of plasma at 2,500g for 15 min;  Again, plasma was removed from the top carefully avoiding cell pellet;  Third centrifugation at 2,500g for 15 min. This more extensive centrifugation process was intended to further increase the removal of platelets, apoptotic bodies, and other large particles and aggregates. Accordingly, we anticipate that the additional centrifugation steps decreased the contamination of our isolates but may have also decreased the amount of exosome proteins, hence the lower amount of exosome proteins identified in our study as compared to the original study from our co-authors (PMID: 32396726).

      One of the proteins that constantly surges on the analysis is KRT20. It would be important to proceed with the analysis by first filtering out possible contaminants of the proteomics, of which keratins are the most common ones.

      We thank the Reviewer for this comment. We would like to point out that we do believe that KRT20 is, in fact, cancer related and a not a contaminant. This is supported by our results presented in this manuscript showing enrichment or KRT20 in PDAC cases, and lower expression in benign samples. If this protein was a contaminant, its expression would be found uniformly in all samples, there would be no apparent reason for different expression between malignant vs benign cases, as all samples were processed following the same procedures. In addition, increased expression of KRT20 in PDAC tissues has also been reported by others. For instance, in a study by Schmiz-Winnthal  (PMID: 16364723), the authors showed that Cytokeratin 20 (KRT20) were expressed in 76% of PDAC patients and expression of KRT20 was associated with poor survival after surgical resection. Based on these observations, we believe that the KRT20 identified in our study is indeed a tumor associated EV protein rather than contamination.

      Finally, none of the 7-extracellular vesicle protein signatures has been validated by other techniques, such as western blot, in extracellular vesicles isolated by other, standard, methods, such as size exclusion chromatography.

      A distinct technique for protein analysis was done but not a different method of isolation of these vesicles. This would strengthen the results and the origin of the proteins.

      We appreciate the Reviewer’s comment. We would like to again emphasize that the goal of this manuscript was not to compare the performance of EVtrap with other traditional EV isolation approaches such as ultracentrifugation and size exclusion chromatography.  The main goal of study is to determine proteomic profiles of EVs isolated from clinical samples and provide such information to research community for further studies. As the Reviewer points out, proteins in EVs are highly heterogeneous which highlight the complexity of EV biology and interpatient heterogeneity of pancreatic cancer.  We do not anticipate the development of EV-based markers for pancreatic diagnosis can be achieved by a single team, but by a community of researchers. We hope information presented in the current study will help other researchers identify additional candidates for validation in future work. Nonetheless, we edited the manuscript to discuss the limitation of not doing cross-validation of protein detection using a different method.

      The conclusions that are reached do not fully meet the proposed aims of the identification of a protein signature in circulating extracellular vesicles that could improve early detection of the disease. The authors did not demonstrate the superiority of detection of these proteins in extracellular vesicles versus simply performing an ELISA, nor their superiority with respect to the current standard procedure for diagnosis.

      We would like to clarify to the Reviewer that the goal of this manuscript was not to prove superiority of the EV signature biomarker in diagnosing pancreatic cancer as compared to current standard of care (SOC) practice, i.e., CT scans, endoscopic ultrasound and CA19-9. In order to prove such superiority, one would require a large, randomized phase III trial with several hundred patients. This was not the pursue of our discovery EV proteomics study and we double checked our manuscript to ensure no such claim was made. Rather, we aimed at developing a new pipeline for discovery of new EV biomarkers and we believe we were able to prove that this approach was successful in discovering a new class of biomarkers based on proteins expressed on extra-cellular vesicles that have predominant expression on patients with pancreatic cancer. Future studies should continue to advance this field with goals of improving on the current standard of care diagnostic methods.

      The authors also suggest that profiling of circulating extracellular vesicles provides unique insights into systemic immune changes during pancreatic cancer development. How is this better than a regular hemogram is not clear.

      We would like to clarify that the overall goal of this study is to provide patient-relevant information for the research community to further investigate biology of extracellular vesicles. For the state 'unique insights into systemic immune changes' we referred to the fact that we discovered EVs carrying proteins involved in immune responses. Previous studies have shown that EVs play important roles in cell-cell communication, discoveries from our study provide candidates for future studies on cellular mechanisms underlying immune regulation during pancreatic cancer development.

      Finally, it would be important to determine how this signature compares with many others described in the literature that have the exact same aim. Why and how would this one be better?

      We would like to again clarify that comparing the diagnostic performance of the EV biomarkers discovered in the study against standard of care methods (CA19-9, ctDNA, CT scan) was beyond the scope of this discovery EV proteomics work. We reviewed the manuscript to ensure that no claims were made as far as superiority against point-of-care tests available in clinic.

      Reviewer #3 (Public Review):

      This work investigates the use of extracellular vesicles (EVs) in blood as a noninvasive 'liquid biopsy' to aid in the differentiation of patients with pancreatic cancer (PDAC) from those with benign pancreatic disease and healthy controls, an important clinical question where biopsies are frequently non-diagnostic. The use of extracellular vesicles as biomarkers of disease has been gaining interest in recent history, with a variety of published methods and techniques, looking at a variety of different compositions ('the molecular cargo') of EVs particularly in cancer diagnosis (Shah R, et al, N Engl J Med 2018; 379:958-966).

      This study adds to the growing body of evidence in using EVs for earlier detection of pancreatic cancer, identifying both new and known proteins of interest. Limitations in studying EVs, in general, include dealing with low concentrations in circulation and identifying the most relevant molecular cargo. This study provides validation of assaying EVs using the novel EVtrap method (Extracellular Vesicles Total Recovery And Purification),which the authors show to be more efficient than current standard techniques and potentially more scalable for larger clinical studies.

      The strength of this study is in its numbers - the authors worked with a cohort of 124 cases,93 of them which were PDAC samples, which are considered large for an EV study (Jia, E etal. BMC Cancer 22, 573 (2022)). The benign disease group (n=20, between chronic pancreatitis and IPMNs) and healthy control groups (n=11) were relatively small, but the authors were not only able to identify candidate biomarkers for diagnosis that clearly stood out in the PDAC cohort, but also validate it in an independent cohort of 36 new subjects.

      Proteins they have identified as associated with pancreatic cancer over benign disease included PDCD6IP, SERPINA12, and RUVBL2. They were even able to identify a set of EV proteins associated with metastasis and poorer prognosis, which include the proteins PSMB4, RUVBL2 and ANKAR and CRP, RALB and CD55. Their 7-EV protein signature yielded an 89% prediction accuracy for the diagnosis of PDAC against a background of benign pancreatic diseases that is compelling and comparable to other studies in the literature (Jia,E. et al. BMC Cancer 22, 573 (2022)).

      The limitations of this study are its containment within a single institution - further studies are warranted to apply the authors' 7-EV protein PRAC panel to multiple other cases at other institutions in a larger cohort.

      We are very thankful to the Reviewer for the positive feedback. We are similarly optimistic that EV-based biomarkers will assist future researchers to develop better diagnostic assays for patients with pancreatic cancer, as well as other tumor types lacking accurate blood-based tests.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

      Our explanations/speculations regarding major comments 2 and 3 were included in the Discussion. We apologize for this misunderstanding as we thought that we were supposed to explain our ideas only in the responses. We did not discuss the comment 4, however, as we are really not sure what is the true effect and did not want to go into wild speculations in our manuscript. We thank this reviewer for his insightful comments and understanding.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The authors report the potential translational regulation of Raf kinase by re-initiation. It would be interesting to show that Raf is indeed regulated by uORF-mediated translation, and that this is dependent on an intact eIF3 complex. Analyzing the potential consequences of Raf1 regulation for cancer cell proliferation or apoptosis would be a plus.

      We agree that this is an interesting and likely possibility. In fact, another clue that translation of Raf1 is regulated by uORFs comes from Bohlen et al. 2023 (PMID: 36869665) where they showed that RAF1 translation is dependent on PRRC2 proteins (that promote leaky scanning through these uORFs). We noted in the discussion that our results from eIF3d/e/hKD and the PRRC2A/B/CKD partly overlap. It is a subject of our follow-up research to investigate whether eIF3 and PRRC2 co-operate together to regulate translation of this important mRNA. 

      (2) The authors show that eIF3 d/e -but not 3h- has an effect on cell proliferation. First, this indicates that proliferation does not fully correlate with eIF3 integrity. Depletion of eIF3d does not affect the integrity of eIF3, yet the effects on proliferation are similar to those of eIF3e. What is the possibility that changes in proliferation reflect functions of eIF3d outside the eIF3 complex? What could be the real consequences of disturbing eIF3 integrity for the mammalian cell? Please, discuss.

      Yes, proliferation does not fully correlate with eIF3 integrity. Downregulation of eIF3 subunits that lead to disintegration of eIF3 YLC core (a, b, c, g, i) have more detrimental effect on growth and translation than downregulation of the peripheral subunits (e, k, l, f, h, m). Our previous studies (Wagner et al. 2016, PMID: 27924037 and Herrmannová et al. 2020, PMID: 31863585) indicate that the YLC core of eIF3 can partially support translation even without its peripheral subunits. In this respect eIF3d (as a peripheral subunit) is an amazing exception, suggesting it may have some specialized function(s). Whether this function resides outside of the eIF3 complex or not we do not know, but do not think so. Mainly because in the absence of eIF3e – its interaction partner, eIF3d gets rapidly degraded. Therefore, it is not very likely that eIF3d exists alone outside of eIF3 complex with moonlighting functions elsewhere. We think that eIF3d, as a head-interacting subunit close to an important head ribosomal protein RACK1 (a landing pad for regulatory proteins), is a target of signaling pathways, which may make it important for translation of specific mRNAs. In support is these thoughts, eIF3d (in the context of entire eIF3) together with DAP5 were shown to promote translation by an alternate capdependent (eIF4F-independent) mechanism (Lee et al. 2016, PMID: 27462815; de la Parra et al. 2018, PMID:30076308). In addition, the eIF3d function (also in the context of entire eIF3) was proved to be regulated by stress-triggered phosphorylation (Lamper et al. 2020, PMID: 33184215). 

      (3) Figure 6D: Surprisingly, reduced levels of ERK1/2 upon eIF3d/e-KD are compensated by increased phosphorylation of ERK1/2 and net activation of c-Jun. Please comment on the functional consequences of buffering mechanisms that the cell deploys in order to counteract compromised eIF3 function. Why would the cell activate precisely the MAPK pathway to compensate for a compromised eIF3 function?

      This we do not know. We can only speculate that when translation is compromised, cells try to counteract it in two ways: 1) they produce more ribosomes to increase translational rates and 2) activate MAPK signaling to send pro-growth signals, which can in the end further boost ribosome biogenesis.

      (4) Regarding DAP-sensitive transcripts, can the authors discuss in more detail the role of eIF3d in alternative cap-dependent translation versus re-initiation? Are these transcripts being translated by a canonical cap- and uORF-dependent mechanism or by an alternative capdependent mechanism?

      This is indeed not an easy question. On one hand, it was shown that DAP5 facilitates translation re-initiation after uORF translation in a canonical cap-dependent manner. This mechanism is essential for translation of the main coding sequence (CDS) in mRNAs with structured 5' leaders and multiple uORFs. (Weber et al. 2022, PMID: 36473845; David et al., 2022, PMID: 35961752). On the other hand, DAP5 was proposed to promote alternative, eIF4F-independent but cap-dependent translation, as it can substitute the function of the eIF4F complex in cooperation with eIF3d (de la Parra et al., 2018, PMID: 30076308; Volta et al., 2021 34848685). Overall, these observations paint a very complex picture for us to propose a clear scenario of what is going on between these two proteins on individual mRNAs. We speculate that both mechanisms are taking place and that the specific mechanism of translation initiation differs for differently arranged mRNAs.

      Minor comments:

      (5) Figure S2C: why is there a strong reduction of the stop codon peak for 3d and 3h KDs?

      We have checked the Ribowaltz profiles of all replicates (in the Supplementary data we are showing only a representative replicate I) and the stop codon peak differs a lot among the replicates. We think that this way of plotting was optimized for calculation and visualization of P-sites and triplet periodicity and thus is not suitable for this type of comparison among samples. Therefore, we have performed our own analysis where the 5’ ends of reads are used instead of P-sites and triplicates are averaged and normalized to CDS (see below please), so that all samples can be compared directly in one plot (same as Fig. S13A but for stop codon). We can see that the stop codon peak really differs and is the smallest for eIF3hKD. However, these changes are in the range of 20% and we are not sure about their biological significance. We therefore refrain from drawing any conclusions. In general, reduced stop codon peak may signal faster termination or increased stop codon readthrough, but the latter should be accompanied by an increased ribosome density in the 3’UTR, which is not the case. A defect in termination efficiency would be manifested by an increased stop codon peak, instead.

      Author response image 1.

       

      (6) Figures 5 and S8: Adding a vertical line at 'zero' in all cumulative plots will help the reader understand the author's interpretation of the data. 

      We have added a dashed grey vertical line at zero as requested. However, for interpretation of these plots, the reader should focus on the colored curve and whether it is shifted in respect to the grey curve (background) or not. Shift to the right indicates increased expression, while shift to the left indicates decreased expression. The reported p-value then indicates the statistical significance of the shift.

      (7) The entire Figure 2 are controls that can go to Supplementary Material. The clustering of Figure S3B could be shown in the main Figure, as it is a very easy read-out of the consistent effects of the KDs of the different eIF3 subunits under analysis.

      We have moved the entire Figure 2 to Supplementary Material as suggested (the original panels can be found as Supplementary Figures 1B, 1C and 3A). Figure S3B is now the main Figure 2E. 

      (8) There are 3 replicates for Ribo-Seq and four for RNA-Seq. Were these not carried out in parallel, as it is usually done in Ribo-seq experiments? Why is there an extra replicate for RNASeq?

      Yes, the three replicates were carried out in parallel. We have decided to add the fourth replicate in RNA-Seq to increase the data robustness as the RNA-Seq is used for normalization of FP to calculate the TE, which was our main analyzed metrics in this article. We had the option to add the fourth replicate as we originally prepared five biological replicates for all samples, but after performing the control experiments, we selected only the 3 best replicates for the Ribo-Seq library preparation and sequencing.  

      (9) Please, add another sheet in Table S2 with the names of all genes that change only at the translation (RPF) levels.

      As requested, we have added three extra sheets (one for each downregulation) for differential FP with Padjusted <0.05 in the Spreadsheet S2. We also provide a complete unfiltered differential expression data (sheet named “all data”), so that readers can filter out any relevant data based on their interest.

      (10) Page 5, bottom: ' ...we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules...'. This is not true for eIF3d, as shown in Fig1B and mentioned in Results.

      This reviewer is correct. By this generalized statement, we were trying to summarize our previous results from Wagner et al., 2014, PMID: 24912683; Wagner et al.,2016, PMID: 27924037 and Herrmannova et al.,2020, PMID: 31863585. The eIF3d downregulation is the only exception that does not affect expression of any other eIF3 subunit. Therefore, we have rewritten this paragraph accordingly: “We recently reported a comprehensive in vivo analysis of the modular dynamics of the human eIF3 complex (Wagner et al, 2020; Wagner et al, 2014; Wagner et al., 2016). Using a systematic individual downregulation strategy, we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules leading to the formation of partial eIF3 subcomplexes with limited functionality (Herrmannova et al, 2020). eIF3d is the only exception in this respect, as its downregulation does not influence expression of any other eIF3 subunit.”

      (11) Page 10, bottom: ' The PCA plot and hierarchical clustering... These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d.' This is already obvious in the polysome profiles of Figure S2C.

      We agree that this result is surely not surprising given the polysome profile and growth phenotype analyses of eIF3hKD. But still, we think that the PCA plot and hierarchical clustering results represent valuable controls. Nonetheless, we rephrased this section to note that this result agrees with the polysome profiles analysis: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: Ribo-Seq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      (12) Page 12: ' As for the eIF3dKD "unique upregulated" DTEGs, we identified one interesting and unique KEGG pathway, the ABC transporters (Supplementary Figure 5A, in green).' This sentence is confusing, as there are more pathways that are significant in this group, so it is unclear why the authors consider it 'unique'.

      The eIF3dKD “unique upregulated” group comprises genes with increased TE only in eIF3dKD but not in eIF3eKD or eIF3hKD (500 genes, Fig 2G). All these 500 genes were examined for enrichment in the KEGG pathways, and the top 10 significant pathways were reported (Fig S6A). However, 8 out of these 10 pathways were also significantly enriched in other gene groups examined (e.g. eIF3d/eIF3e common). Therefore, the two remaining pathways (“ABC transporters” and “Other types of O-glycan biosynthesis”) are truly unique for eIF3dKD. We wanted to highlight the ABC transporters group in particular because we find it rather interesting (for the reasons mentioned in the article). We have corrected the sentence in question to avoid confusion: “Among the eIF3dKD “unique upregulated” DTEGs, we identified one interesting KEGG pathway, the ABC transporters, which did not show up in other gene groups (Supplementary Figure 6A, in green). A total of 12 different ABC transporters had elevated TE (9 of them are unique to eIF3dKD, while 3 were also found in eIF3eKD), 6 of which (ABCC1-5, ABCC10) belong to the C subfamily, known to confer multidrug resistance with alternative designation as multidrug resistance protein (MRP1-5, MRP7) (Sodani et al, 2012).

      Interestingly, all six of these ABCC transporters were upregulated solely at the translational level (Supplementary Spreadsheet S2).”    

      (13) Note typo ('Various') in Figure 4A.

      Corrected

      (14) The introduction could be shortened.

      This is a very subjective requirement. In fact, when this manuscript was reviewed in NAR, we were asked by two reviewers to expand it substantially. Because a number of various research topics come together in this work, e.g. translational regulation, the eIF3 structure and function, MAPK/ERK signaling, we are convinced that all of them demand a comprehensive introduction for non-experts in each of these topics. Therefore, with all due respect to this reviewer, we did not ultimately shorten it.

      Reviewer #2 (Recommendations For The Authors):

      - In Figure 2, it would be useful to know why eIF3d is destabilized by eIF3e knockdown - is it protein degradation and why do the eIF3d/e knockdowns not more completely phenocopy each other when there is the same reduction to eIF3d as in the eIF3d knockdown sample?

      Yes, we do think that protein degradation lies behind the eIF3d destabilization in the eIF3eKD, but we have not yet directly demonstrated this. However, we have shown that eIF3d mRNA levels are not altered in eIF3eKD and that Ribo-Seq data indicate no change in TE or FP for eIF3d-encoding mRNA in eIF3eKD. Nonetheless, it is important to note (and we discuss it in the article) that eIF3d levels in eIF3dKD are lower than eIF3d levels in eIF3eKD (please see Supplementary Figure 1C). In fact, we believe that this is one of the main reasons for the eIF3d/e knockdowns differences.

      - The western blots in Figures 4 and 6 show modest changes to target protein levels and would be strengthened by quantification.

      We have added the quantifications as requested by this reviewer and the reviewer 3.

      - For Figure 4, this figure would be strengthened by experiments showing if the increase in ribosomal protein levels is correlated with actual changes to ribosome biogenesis.

      As suggested, we performed polysome profiling in the presence of EDTA to monitor changes in the 60S/40S ratio, indicating a potential imbalance in the biogenesis of individual ribosome subunits. We found that it was not affected (Figure 3G). In addition, we performed the same experiment, normalizing all samples to the same number of cells (cells were carefully counted before lysis). In this way, we confirmed that eIF3dKD and eIF3eKD cells indeed contain a significantly increased number of ribosomes, in agreement with the western blot analysis (Figure 3H).

      - In Figure 6, there needs to be a nuclear loading control.

      This experiment was repeated with Lamin B1 used as a nuclear loading control – it is now shown as Fig. 5F.

      - For Figure 8, these findings would be strengthened using luciferase reporter assays where the various RNA determinants are experimentally tested. Similarly, 5′ TOP RNA reporters would have been appreciated in Figure 4.

      This is indeed a logical continuation of our work, which represents the current work in progress of one of the PhD students. We apologize, but we consider this time- and resource-demanding analysis out of scope of this article.

      Reviewer #3 (Recommendations For The Authors):

      (1) Within the many effects observed, it is mentioned that eIF3d is known to be overexpressed while eIF3e is underexpressed in many cancers, but knockdown of either subunit decreases MDM2 levels, which would be expected to increase P53 activity and decrease tumor cell transformation. In contrast, they also report that 3e/3d knockdown dramatically increases levels of cJUN, presumably due to increased MAPK activity, and is expected to increase protumor gene expression. Additional discussion is needed to clarify the significance of the findings, which are a bit confusing.

      This is indeed true. However, considering the complexity of eIF3, the largest initiation factor among all, as well as the broad portfolio of its functions, it is perhaps not so surprising that the observed effects are complex and may seem even contradictory in respect to cancer. To acknowledge that, we expanded the corresponding part of discussion as follows: “Here, we demonstrate that alterations in the eIF3 subunit stoichiometry and/or eIF3 subcomplexes have distinct effects on the translatome; for example, they affect factors that play a prominent (either positive or negative) role in cancer biology (e.g., MDM2 and cJUN), but the resulting impact is unclear so far. Considering the complex interactions between these factors as well as the complexity of the eIF3 complex per se, future studies are required to delineate the specific oncogenic and tumor suppressive pathways that play a predominant role in mediating the effects of perturbations in the eIF3 complex in the context of neoplasia.”

      (2) There are places in the text where the authors refer to changes in transcriptional control when RNA levels differ, but transcription versus RNA turnover wasn't tested, e.g. page 16 and Figure S10, qPCR does not confirm "transcriptional upregulation in all three knockdowns" and page 19 "despite apparent compensatory mechanisms that increase their transcription."

      This is indeed true, the sentences in question were corrected. The term “increased mRNA levels” was used instead of transcriptional upregulation (increased mRNA stabilization is also possible).

      (3) Similarly, the authors suggest that steady-state LARP1 protein levels are unaffected based on ribosome footprint counts (page 21). It is incorrect to assume this, because ribosome footprints can be elevated due to stalling on RNA that isn't being translated and doesn't yield more protein, and because levels of translated RNA/synthesized proteins do not always reflect steady-state protein levels, especially in mutants that could affect lysosome levels and protein turnover. Also page 12, 1st paragraph suggests protein production is down when ribosome footprints are changed.

      Yes, we are well-aware of this known limitation of Ribo-seq analysis. Therefore, the steadystate protein levels of our key hits were verified by western blotting. In addition, we have removed the sentence about LARP1 because it was based on Ribo-Seq data only without experimental evaluation of the steady-state LARP1 protein levels.

      (4) The translation buffering effect is not clear in some Figures, e.g. S6, S8, 8A, and B. The authors show a scheme for translationally buffered RNAs being clustered in the upper right and lower left quadrants in S4H (translation up with transcript level down and v.v.), but in the FP versus RNA plots, the non-TOP RNAs and 4E-P-regulated RNAs don't show this behavior, and appear to show a similar distribution to the global changes. Some of the right panels in these figures show modest shifts, but it's not clear how these were determined to be significant. More information is needed to clarify, or a different presentation, such as displaying the RNA subsets in the left panels with heat map coloring to reveal whether RNAs show the buffered translation pattern defined in purple in Figure S4H, or by reporting a statistical parameter or number of RNAs that show behavior out of total for significance. Currently the conclusion that these RNAs are translationally buffered seems subjective since there are clearly many RNAs that don't show changes, or show translation-only or RNA-only changes.

      We would like to clarify that S4H does not indicate a necessity for changes in FPs in the buffered subsets. Although opposing changes in total mRNA and FPs are classified as buffering, often we also consider the scenario where there are changes to the total mRNA levels not accompanied by changes in ribosome association.

      In figure S6, the scatterplots indicate a high density of genes shifted towards negative fold changes on the x-axis (total mRNA). This is also reflected in the empirical cumulative distribution functions (ecdfs) for the log2 fold changes in total mRNA in the far right panels of A and B, and the lack of changes in log2 fold change for FPs (middle panels). Similarly, in figure S8, the scatterplots indicate a density of genes shifted towards positive fold changes on the x-axis for total mRNA. The ecdfs also demonstrate that there is a significant directional shift in log2 fold changes in the total mRNA that is not present to a similar degree in the FPs, consistent with translational offsetting. It is rightly pointed out that not all genes in these sets follow the same pattern of regulation. We have revised the title of Supplementary Figure S6 (now S7) to reflect this. However, we would like to emphasize that these figures are not intended to communicate that all genes within these sets of interest are regulated in the same manner, but rather that when considered as a whole, the predominant effect seen is that of translational offsetting (directional shifts in the log2 fold change distribution of total mRNA that are not accompanied by similar shifts in FP mRNA log2 fold changes).

      The significance of these differences was determined by comparing the ecdfs of the log2 fold changes for the genes belonging to a particular set (e.g. non-TOP mTOR-sensitive, p-eIF4E-sensitive) against all other expressed genes (background) using a Wilcoxan rank sum test. This allows identification of significant shifts in the distributions that have a clear directionality (if there is an overall increase, or decrease in fold changes of FPs or total mRNA compared to background). If log2 fold changes are different from background, but without a clear directionality (equally likely to be increased or decreased), the test will not yield a significant result. This approach allows assessment of the overall behavior of gene signatures within a given dataset in a manner that is completely threshold-independent, such that it does not rely on classification of genes into different regulatory categories (translation only, buffering, etc.) based on significance or fold-change cut-offs (as in S4H). Therefore, we believe that this unbiased approach is well-suited for identifying cases when there are many genes that follow similar patterns of regulation within a given dataset.

      (5) Page 10-"These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d" ...These results suggest that eIF3h has less impact on the translatome, not that it does so differently. If it were changing translation by a different mechanism, I would not expect it to cluster with control.

      This sentence was rewritten as follows: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: RiboSeq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      Other minor issues:

      (1) There are some typos: Figure 2 leves, Figure 4 variou,

      Corrected.

      (2) Figure 3, font for genes on volcano plot too small

      Yes, maybe, however the resolution of this image is high enough to enlarge a certain part of it at will. In our opinion, a larger font would take up too much space, which would reduce the informativeness of this graph.

      (3) Figure S5, highlighting isn't defined.

      The figure legend for S5A (now S6A) states: “Less significant terms ranking 11 and below are in grey. Terms specifically discussed in the main text are highlighted in green.” Perhaps it was overlooked by this reviewer.

      (4) At several points the authors refer to "the MAPK signaling pathway", suggesting there is a single MAPK that is affected, e.g in the title, page 3, and other places when it seems they mean "MAPK signaling pathways" since several MAPK pathways appear to be affected.

      We apologize for any terminological inaccuracies. There are indeed several MAPK pathways operating in cells. In our study, we focused mainly on the MAPK/ERK pathway. The confusion probably stems from the fact that the corresponding term in the KEGG pathway database is labeled "MAPK signaling pathway" and this term, although singular, includes all MAPK pathways. We have carefully reviewed the entire article and have corrected the term used accordingly to either: 1) MAPK pathways in general, 2) the MAPK/ERK pathway for this particular pathway, or 3) "MAPK signaling pathway", where the KEGG term is meant.

      (5) Some eIF3 subunit RNAs have TOP motifs. One might expect 3e and 3h levels to change as a function of 3d knockdown due to TOP motifs but this is not observed. Can the authors speculate why the eIF3 subunit levels don't change but other TOP RNAs show TE changes? Is this true for other translation factors, or just for eIF3, or just for these subunits? Could the Western blot be out of linear range for the antibody or is there feedback affecting eIF3 levels differently than the other TOP RNAs, or a protein turnover mechanism to maintain eIF3 levels?

      This is indeed a very interesting question. In addition to the mRNAs encoding ribosomal proteins, we examined all TOP mRNAs and added an additional sheet to the S2 supplemental spreadsheet with all TOP RNAs listed in (Philippe et al., 2020, PMID: 32094190). According to our Ribo-Seq data, we could expect to see increased protein levels of eIF3a and eIF3f in eIF3dKD and eIF3eKD, but this is not the case, as judged from extensive western blot analysis performed in (Wagner et. al 2016, PMID: 27924037). Indeed, we cannot rule out the involvement of a compensatory mechanism monitoring and maintaining the levels of eIF3 subunits at steady-state – increasing or decreasing them if necessary, which could depend on the TOP motif-mediated regulation. However, we think that in our KDs, all non-targeted subunits that lose their direct binding partner in eIF3 due to siRNA treatment become rapidly degraded. For example, co-downregulation of subunits d, k and l in eIF3eKD is very likely caused by protein degradation as a result of a loss of their direct binding partner – eIF3e. Since we showed that the yeast eIF3 complex assembles co-translationally (Wagner et. al 2020, PMID: 32589964), and there is no reason to think that mammalian eIF3 differs in this regard, our working hypothesis is that free subunits that are not promptly incorporated into the eIF3 complex are rapidly degraded, and the presence or absence of the TOP motif in the 5’ UTR of their mRNAs has no effect. As for the other TOP mRNAs, translation factors eEF1B2, eEF1D, eEF1G, eEF2 have significantly increased FPs in both eIF3dKD and eIF3eKD, but we did not check their protein levels by western blotting to conclude anything specific.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths: 

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.

      (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING, and CellChat, to define the potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.

      (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses: 

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei on day 0-7 combined). It is unclear if the number is sufficient to discern subtle biological processes such as stem cell differentiation. 

      We analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations, revealing the diversity of cell populations in uninjured periosteum and post-injury, including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more in-depth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cells that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform sc/snRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in (Debnath et al. 2018), 300 in (Ambrosi et al. 2021), around 175 in (Remark et al. 2023).)

      (2) The authors' designation of Sca1+CD34+ cells as SSPCs is not sufficiently supported by experimental evidence. It will be essential to demonstrate stem/progenitor properties of Sca1+CD34+ cells using independent biological approaches such as CFU-F assays. In addition, the putative lineage trajectory of SSPCs toward IIFCs, osteoblasts, and chondrocytes remains highly speculative without concrete supporting data. 

      We performed additional analyses to further support that Sca1+ SSPCs display stem/progenitor properties. We performed CFU assays with Prx1-GFP+ SCA1+ and Prx1-GFP+ SCA1- periosteal cells (Figure 2F-G). We showed that Prx1-GFP+ SCA1+ display significant increased CFU potential compared to Prx1-GFP+ SCA1- cells. In addition, we isolated and transplanted Prx1-GFP+ Sca1+ and Prx1-GFP+ Sca1- periosteal cells at the fracture site of wild-type mice (Figure 2H). Only Sca1+ cells contributed to the callus formation, reinforcing that Sca1+ cells are the SSPC population mediating bone repair. 

      The differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses all point to Sca1+ cells as the SSPC population (Fig 2EG).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ pSSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fibrogenic fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supplementary figure 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs isolated from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrated the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      (3) The designation of POSTN+ clusters as injury-induced fibrogenic cells (IIFCs) is not fully supported by the presented data. The authors' snRNAseq datasets (Figure 1d) demonstrate that there are many POSTN+ cells prior to injury, indicating that POSTN+ cells are not specifically induced in response to injury. It has been widely recognized that POSTN is expressed in the periosteum without fracture. This raises a possibility that the main responder of fracture healing is POSTN+ cells, not SSPCs as they postulate. The authors cannot exclude the possibility that Sca1+CD34+ cells are mere bystanders and do not participate in fracture healing. 

      IIFCs are a population of cells that express high levels of ECM related genes, including Postn, Aspn and collagens. We did not claim that Postn expression is specific to IIFCs. While Postn is detected in the uninjured periosteum, snRNAseq analyses and RNAscope experiments showed that the expression of Postn is limited to a small number of cells in the cambium layer of the periosteum (Fig 4B , Figure 4 – Supplementary figure 1B). These Postn-expressing cells in the uninjured periosteum are not SSPCs, as they do not co-express/co-localize with Pi16+ and Sca1+ cells detected in the fibrous layer (Fig4, Figure 4– Supplementary figure 1A, Figure 6-Supplementary figure 1). These Postn-expressing cells are undergoing osteogenic differentiation as shown by the correlation between Runx2 and Postn expression (Fig. 4 – Supplementary Figure 1C). After fracture, we observed a strong increase in ECM-related gene expression and specifically in the IIFC population. We now show the strong increase of Postn expression after injury (Fig. 4 – Supplementary Figure 1D-E, Figure 6-Supplementary figure 1E). 

      As mentioned in our response above, we now show that SCA1+ cells form cartilage and bone after fracture, while SCA1- cells (including the POSTN+ population) from the uninjured periosteum did not contribute. These data reveal that Sca1+ CD34+ cells are the main SSPC population mediating bone healing and that POSTN+ IIFCs are a transient stage of SSPC differentiation. We added the following text to the result section: “Pi16-expressing SSPCs are located within the fibrous layer, while we observed few POSTN+ cells in the cambium layer (Fig. 4 – Supplementary Fig. 1A). Postn expression is weak in uninjured periosteum and is limited to differentiating cells. Postn expression is strongly increased in response to fracture, specifically in IIFCs (Fig. 4 – Supplementary Fig. 1B-E). “

      (4) Detailed spatial organization of Sca1+CD34+ cells and POSTN+ cells in the uninjured periosteum with respect to the cambium layer and the fibrous layer is not demonstrated. 

      We performed RNAscope experiments to locate Pi16-expressing and Postn-expressing cells in the uninjured periosteum. We observed that Pi16-expressing cells are in the external fibrous layer of the periosteum while Postn-expressing cells are located along the cortex in the cambium layer. The data are added in Fig 4B and Fig. 4- Supplementary Figure 1 and mentioned in the result section “Pi16-expressing SSPCs were located within the fibrous layer, while Postn-expressing cells were found in the cambium layer and corresponded to Runx2-expressing osteogenic cells (Fig. 4 – Supplementary Fig. 1A-C).”.

      (5) Interpretation of transplantation experiments in Figure 5 is not straightforward, as the authors did not demonstrate the purity of Prx1Cre-GFP+SCA1+ cells and Prx1Cre-GFP+CD146- cells to pSSPCs and IIFCs, respectively. It is possible that these populations contain much broader cell types beyond SSPCs or IIFCs.  

      We agree with the reviewer that our methodology for cell transplantation required more justification and validation. We decided to use a transgenic mouse line to be able to trace the cells in vivo after grafting. Prx1 marks limb mesenchyme during development and the Prx1Cre mouse model allows to label all SSPCs contributing to callus formation. Therefore, we used Prx1Cre, R26mTmG mice as donors for SSPCs and IIFCs isolation (Duchamp de Lageneste et al. 2018; Logan et al. 2002). Prx1 does not mark immune and endothelial cells but can label pericytes and fibroblastic populations (Duchamp de Lageneste et al. 2018; Logan et al. 2002; Julien et al. 2021). In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells (Fig 3-Supplementary figure 2, Fig 6-Supplementary figure 1). We sorted GFP+ Sca1+ cells from uninjured periosteum of Prx1Cre, R26mTmG mice to isolate only SSPCs and excluding endothelial cells and pericytes. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detected IIFCs but no SSPCs, chondrocytes or osteoblasts at this stage of repair. To eliminate Prx1-derived pericytes, we sorted GFP+CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 post-fracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text: “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      Reviewer #2 (Public Review):

      Summary: 

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths: 

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. 

      Weaknesses: 

      This study endeavored to elucidate the role of IIFC through snRNAseq analysis and in vivo observation. However, such validation alone is insufficient to confirm that IIFC is an osteochondrogenic progenitor, and additional data presentation is required.  

      As mentioned in the response to Reviewer 1, the differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses altogether showed that Sca1+ cells are the SSPC population (Fig 2E-G).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ SSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supp 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrate the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses strongly support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      We made the following changes in the text:

      - Line 81-87: “We performed in vitro CFU assays with sorted GFP+SCA1+  and GFP+SCA1- cells isolated from the periosteum of Prx1Cre; R26mTmG mice, as Prx1 labels all SSPCs contributing to the callus formation1. Prx1-GFP+ SCA1+ showed increased CFU potential, confirming their stem/progenitor property (Fig 2F-G).  Then, we grafted Prx1GFP+ SCA1+ et Prx1-GFP+ SCA1- periosteal cells at the fracture site of wild-type mice. Only SCA1+ cells formed cartilage and bone after fracture indicating that SCA1+ cells correspond to periosteal SSPCs with osteochondrogenic potential (Fig 2H).”

      - Line 120-122: “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2).”

      - Line 170-172: “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).”

      - Line 277-278: “Following this unique fibrogenic step, IIFCs do not undergo cell death but undergo either osteogenesis or chondrogenesis”

      - Line 281-283: “During bone repair, this initial fibrogenic process is an integral part of the SSPC differentiation process, and a transitional step prior to osteogenesis and chondrogenesis.”

      Reviewer #3 (Public Review): 

      In this manuscript, the authors explored the transcriptional heterogeneity of the periosteum with single nuclei RNA sequencing. Without prior enrichment of specific populations, this dataset serves as an unbiased representation of the cellular components potentially relevant to bone regeneration. By describing single-cell cluster profiles, the authors characterized over 10 different populations in combined steady state and post-fracture periosteum, including stem cells (SSPC), fibroblast, osteoblast, chondrocyte, immune cells, and so on. Specifically, a developmental trajectory was computationally inferred using the continuum of gene expression to connect SSPC, injury-induced fibrogenic cells (IIFC), chondrocyte, and osteoblast, showcasing the bipotentials of periosteal SSPCs during injury repair. Additional computational pipelines were performed to describe the possible gene regulatory network and the expected pathways involved in bone regeneration. Overall, the authors provided valuable insights into the cell state transitions during bone repair and proposed sets of genes with possible involvements in injury response. 

      While the highlights of the manuscript are the unbiased characterization of periosteal composition, and the trajectory of SSPC response in bone fracture response, many of the conclusions can be more strongly supported with additional clarifications or extensions of the analysis.  

      (1) As described in the method section, both the steady-state data and full dataset underwent integration before dimensional reduction and clustering. It would be appreciated if the authors could compare the post-integration landscapes of uninjured cells between steady state and full dataset analysis. Specifically, fibroblasts were shown in Figure 1C and 1E, and such annotations did not exist in Figure 2B. Will it be possible that the original 'fibroblasts' were part of the IIFC population? 

      As suggested, we now identified the fibroblast population from the uninjured periosteum in the integration of datasets from all time points (Figure 5B and Fig. 5 – Supplementary Figure 2). We identified 4 fibroblast populations in the uninjured periosteum: Luzp2+, Cldn1+, Hsd11b1+ and Csmd1+ fibroblasts. Luzp2+ and Cldn1+ fibroblasts are clustering distinctly from the other populations in the integrated dataset. Hsd11b1+ fibroblasts blend with SSPCs and IIFCs in the integrated dataset probably due to the low cell number. Finally, Csmd1+ fibroblasts are clustering at the interface between SSPCs and IIFCs likely because they correspond to differentiating cells both in the uninjured periosteum and in response to fracture. We modified the resolution of clustering in our subset dataset, in order to represent Luzp2+ and Cldn1+ fibroblasts as an isolated cluster (Figure 5B, cluster 10). In addition, both pseudotime (Fig. 5B) and gene regulatory network analyses (Fig. 7D), show that the fibroblast populations are distinct from the activation trajectory of SSPCs. We added the following sentence to the text “Fibroblasts from uninjured periosteum (Hsd11b1+, Cldn1+ and Luzp2+ cells corresponding to cluster 10 of Fig. 5B) clustered separately from the other populations, suggesting the absence of their contribution to bone healing.”

      (2) According to Figure 2, immune cells were taking a significant abundance within the dataset, specifically during days 3 & 5 post-fracture. It will be interesting to see the potential roles that immune cells play during bone repair. For example, what are the biological annotations of the immune clusters (B, T, NK, myeloid cells)? Are there any inflammatory genes or related signals unregulated in these immune cells? Do they interact with SSPC or IIFC during the transition?   

      In this manuscript, we report the overall dataset and focused our analyses on the response of SSPCs to injury and their differentiation trajectories. We did not include detailed analyses of the immune cell populations, that are out of scope of this manuscript and are part of another study (Hachemi et al, biorxiv, 2024)

      (3) The conclusion of Notch and Wnt signaling in IIFC transition was not sufficiently supported by the analysis presented in the manuscript, which was based on computational inferences. It will be great to add in references supporting these claims or provide experimental validations examining selected members of these pathways.

      The role of Wnt and Notch in bone repair has been widely studied and both signaling pathways are known to be regulators of SSPCs differentiation (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017; Matsushita et al. 2020; Steven Minear et al. 2010; Steve Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010). It was previously shown that Notch inactivation at early stages of repair leads to bone non-union while Notch inactivation in chondrocytes and osteoblasts does not significantly affect healing, confirming its role in SSPC differentiation before osteochondral commitment (Wang et al. 2016). Wnt was shown to be a critical driver of osteogenesis (Matsushita et al. 2020; Steve Minear et al. 2010; Steven Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010), as Wnt inhibition alters bone formation and Wnt overactivation increases bone formation (Pinzone et al. 2009; Balemans et Van Hul 2007). The role of Wnt is specific to osteogenic engagement as Wnt inhibition promotes chondrogenesis (Hsieh et al. 2023; C.-L. Wu et al. 2021; Ruscitto et al. 2023). A study by Lee et al. recently confirmed the successive activation and crosstalk of Notch and Wnt pathways during osteogenic differentiation of SSPCs during bone healing (Lee et al. 2021). They showed a peak of Notch activation at day 3 post-injury followed by a progressive decrease that parallels an increase of Wnt signaling inducing osteogenic differentiation. These studies correlate with the sequential activation of Notch and Wnt observed in our snRNAseq analyses. Our analyses now reveal how this sequential activation of Notch and Wnt relates to the fibrogenic and osteogenic phase of SSPC differentiation respectively. We clarified this in the discussion and added the references above to support our claims. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The manuscript is well-written overall. However, the authors often oversimplify outcomes and overstate the results. Some of the statements (delineated below) need to be recalibrated to be in line with the presented data. 

      In addition to the suggested conclusions, we also toned down the following ones to avoid overstating our results :

      Line 24: suggesting a crucial paracrine role of this transient IIFC population

      Line 227: suggesting their central role in mediating cell interactions after fracture

      line 243: IIFCs produce paracrine factors that can regulate SSPCs

      - Line 77 (86): The authors should add "might" before "correspond to". 

      We provided new sets of data including CFU experiments and transplantation assay to reinforce our conclusion. We replaced “correspond to” by “encompass”

      - Line 102: SSPCs are obviously not "absent" in day 3 snRNAseq (Figure 2d). The percentage dropped (only) 75%, according to Figure 2e, which is far from disappearance. Overall, immunohistochemical staining is often dichotomous with snRNAseq designations. The authors should more carefully describe the results. 

      We agree that this comment may not reflect the data shown as we observe a strong decrease in the percentage of cells in SSPC clusters, but still detect few cells in the SSPC clusters. However, when we looked at the presence of Sca1+ Pi16+ cells at different time points, we confirmed the absence of cells expressing SSPC signature genes (Sca1, Pi16, Cd34) at day 3 injury. Due to the clustering resolution of the combined integration, some cells in the SSPC clusters might not be Sca1+ Pi16+. We now show these results in Fig. 4 – Supplementary Figure 2. We changed the text accordingly (line 120): “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in the day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2)”.

      - Line 134: The authors need to clearly state that GFP+IIFCs were isolated based on Prx1CreGFP+CD146-. The authors did not clearly demonstrate the relationship between POSTN+ cells and CD146- cells, which poses concerns about the interpretation of transplantation experiments. 

      As mentioned above in response to reviewer 1-public review, we have clarified and provided additional information on our strategy to isolate SSPCs and IIFCs. We used the Prx1Cre; R26mTmG mice to mark all SSPCs and their derivatives with the GFP reporter in order to trace these populations after cell grafting. In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells. We sorted GFP+Sca1+ cells to exclude endothelial cells. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detect IIFCs but no SSPCs, chondrocytes or osteoblasts at this time point. However, we also detected pericytes that can be Prx1-derived. To eliminate potential pericyte contamination, we sorted GFP+ CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 postfracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text (line 153): “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      - Line 211: It is obvious from Figure 8F that ligand expression was not "specific" to the IIFC phase.

      The data only shows a slight enrichment of ligand score. 

      We corrected the text by “ligand expression was increased during the IIFC phase”.

      (2) Some of the computational predictions are incongruent with the known lineage trajectory. For example, in vivo lineage tracing experiments, including but not limited to, PLoS Genet. 2014. 10:e1004820, demonstrate that some of the chondrocytes within fracture callus can differentiate into osteoblasts. This is incompatible with the authors' conclusion that osteoblasts and chondrocytes represent two different terminal stages of cell differentiation in fracture healing. How do the authors reconcile this apparent inconsistency? 

      In this manuscript, we generated datasets corresponding to the initial stages of bone repair until day 7 post-injury. Therefore, our analyses encompass SSPC activation stages and engagement into osteogenesis and chondrogenesis. The results show that a portion of osteoblasts in the fracture callus are differentiating directly from IIFC via intramembranous ossification. The reviewer is correct to mention that osteoblasts have also been shown to derive from transdifferentiation of chondrocytes, which occurs at later stages of repair during the active phase of endochondral ossification (Julien et al. 2020; Aghajanian et Mohan 2018; Zhou et al. 2014; Hu et al. 2017). This process of chondrocyte to osteoblast transdifferentiation is not represented in our integrated dataset and may require adding later time points. However, when we analyzed the days 5 and 7 datasets independent of days 0 and 3, we were able to identify a cluster of hypertrophic chondrocytes (expressing Col10a1) connecting the clusters of chondrocytes and osteoblasts. This suggests that in this cluster, hypertrophic chondrocytes are undergoing transdifferentiation into osteoblasts as shown in the Author response image 1. Additional time points are needed in a future study to perform in depth analyses of chondrocyte transdifferentiation. 

      Author response image 1.

      Periosteum-derived chondrocytes undergo cartilage to bone transformation. A. UMAP projection of the subset of SSPCs, IIFCs, osteoblasts and chondrocytes in the integration of days 5 and 7 post-fracture datasets. B. Feature plots of Acan, Col10a1 and Ibsp expression.  C. UMAP projection separated by time points. D. Percentage of cells in the hypertrophic/differentiating chondrocyte cluster.

      (3) The authors did not cite some of the studies that described the roles of Notch signaling in fracture healing, for example, J Bone Miner Res. 2014. 29:1283-94. The authors should test the specificity of Notch signaling activities to IIFCs (POSTN+ cells) in vivo. 

      The role of Notch in the activation of SSPCs during bone repair has been investigated in several studies (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017). Notch dynamic was previously described with a peak at day 3 post-injury before a reduction when cells engage in osteogenesis and chondrogenesis (Lee et al. 2021; Dishowitz et al. 2012; Matthews et al. 2014). Notch plays a role in the early steps of SSPC activation prior to osteochondral differentiation as Notch inactivation in chondrocytes and osteoblasts does not affect bone repair (Wang et al. 2016). We added the references listed above to emphasize the correlation between our results and previous reports on the role of Notch and made changes in the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions 

      (1) This research utilized snRNA seq for the basic hypothesis formation; however, the number of nuclei acquired was quite limited. Therefore, please explain the rationale for employing snRNA seq instead of scRNA seq, which includes cytoplasm, and additionally provide the markers used for cell type mapping in the scRNA analysis.  

      As mentioned in our response to reviewer #1 above, we analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more indepth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cell that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform scRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in(Debnath et al. 2018), 300 in (Ambrosi et al. 2021) around 175 in(Remark et al. 2023))

      Several studies have shown that snRNAseq provide data quality equivalent to scRNAseq in terms of cell type identification, number of detected genes and downstream analyses (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021). While, snRNAseq do not allow the detection of cytoplasm RNA, there is several advantages in using this technique: 

      (1) better representation of the cell types. To perform scRNAseq, a step of enzymatic digestion is needed. This usually leads to an overrepresentation of some cell types loosely attached to the ECM (immune cells, endothelial cells) and a reduced representation of cell types strongly attached to the ECM, such as chondrocytes and osteoblasts. In addition, large or multinucleated cells like hypertrophic chondrocytes and osteoclasts are too big to be sorted and encapsidated using 10X technology. Here, we optimized a protocol to mechanically isolate nuclei from dissected tissues that allows us to capture the diversity of cell types in periosteum and fracture callus.

      (2) higher recovery of nuclei. We performed both isolation of cells and nuclei from periosteum in our study and observed that nuclei extraction is the most efficient way to isolate cells from the periosteum and the fracture callus.

      (3) reduction of isolation time and cell stress. Previous studies showed that enzymatic digestion causes cell stress and induces stem cell activation (Machado et al. 2021; van den Brink et al. 2017). Therefore, we decided to perform snRNAseq to analyze the transcriptome of the intact periosteum without digestion induced-biais.

      We added this sentence in the result section: “Single nuclei transcriptomics was shown to provide results equivalent to single cell transcriptomics, but with better cell type representation and reduced digestion-induced stress response (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021)”.

      The list of genes used for cell type mapping are presented in Figure 3 – Supplementary figure 1. We added a detailed dot plot as Figure 3 – Supplementary figure 2.

      (2) During the fracture healing process of long bones, the influx of fibroblasts is a relatively common occurrence, and the fibrous callus that forms during bone repair and regeneration is reported to disappear over time. Therefore, inferring that IIFC differentiates into osteo- and chondrogenic cells based solely on their simultaneous appearance in the same time and space is challenging. More detailed validation is necessary, beyond what is supported by bioinformatics analysis. 

      The first step of bone repair is the formation of a fibrous callus, before cartilage and bone formation. There are no data in the literature demonstrating that an influx of fibroblasts occurs at the fracture site. Several studies now show that cells involved in callus formation are recruited locally (i.e. from the bone marrow, the periosteum and the skeletal muscle surrounding the fracture site) (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Julien et al. 2022; Matthews et al. 2021). The contribution of locally activated SSPCs to the fibrous callus is less well understood. Lineage tracing shows that GFP+ cell populations traced in Prx1Cre-GFP mice include SSPCs, IIFCs, chondrocytes and osteoblasts.

      The timing of the cell trajectories observed in our dataset correlates with the timing of callus formation previously described in the literature as the day 3 post-fracture mostly contains IIFCs while chondrocytes and osteoblasts appear from day 5 post-fracture. We conclude that IIFCs differentiate into osteochondrogenic cells based on multiple evidence beside the simultaneous appearance in time and space:

      - In silico trajectory analyses identify a trajectory from SSPCs to osteochondrogenic cells via IIFCs. We added an analysis to show that our pseudotime trajectory parallels the timepoints of the dataset, confirming that the differentiation trajectory follows the timing of cell differentiation (Figure 5D).

      - We show that IIFCs start to express chondrogenic and osteogenic genes prior to engaging into chondrogenesis and osteogenesis. In addition, we detected activation of osteo- and chondrogenic specific transcription factors in IIFCs. This shows a differentiation continuum between SSPCs, IIFCS, and osteochondrogenic cells (Figures 6-8).

      - Using transplantation assay, we showed that IIFCs form cartilage and bone, therefore reinforcing the osteochondrogenic potential of this population (Figure 6B).

      - IIFCs do not undergo apoptosis. We assessed the expression of apoptosis-related genes by IIFCs and did not detect expression. This was confirmed by cleaved caspase 3 immunostaining showing that a very low percentage of cells in the early fibrotic tissue undergo apoptosis. 

      Therefore, the idea that the initial fibrous callus is replaced by a new influx of SSPCs or committed progenitors is not supported by recent literature and is not observed in our dataset containing all cell types from the periosteum and fracture site. Overall, our bioinformatic analyses combined with our in vivo validation strongly support that IIFCs are differentiating into chondrocytes and osteoblasts during bone repair. Additional in vivo functional studies will aim to further validate the trajectory and investigate the critical factors regulating this process.

      (3) The influx of most osteogenic progenitors to the bone fracture site typically appears after postfracture day 7. It's essential to ascertain whether the osteogenic cells observed at the time of this study differentiated from IIFC or migrated from surrounding mesenchymal stem cells. 

      As mentioned above, there is not clear evidence in the literature indicating an influx of osteoprogenitors. Cells involved in callus formation are recruited locally and predominantly from the periosteum (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Matthews et al. 2021; Julien et al. 2022). Our datasets therefore include all cell populations that form the callus. Other sources of SSPCs include the surrounding muscle that contributes mostly to cartilage, and bone marrow that contributes to a low percentage of the callus osteoblasts in the medullary cavity (Julien et al. 2021; Jeffery et al. 2022). We provide evidence that IIFCs give rise to osteogenic cells using our bioinformatic analyses and in vivo transplantation assay (listed in the response above). As indicated in our response to reviewer #1, the steps leading to osteogenic differentiation observed in our dataset reflect the first step of callus ossification and correspond to the process of intramembranous ossification (up to day 7 post-injury). Endochondral ossification also contributes to osteoblasts including the transdifferentiation of chondrocytes into osteoblasts (Julien et al. 2020; Zhou et al. 2014; Hu et al. 2017). While this process mostly occurs around day 14 postfracture, we begin to detect this transition in our integrated day 5-day 7 dataset as shown in Author response image 1. 

      (4) It's crucial to determine whether the IIFC appearing at the fracture site contributes to the formation of the callus matrix or undergoes apoptosis during the fracture healing process. In the early steps of bone repair, the callus is mostly composed of an extracellular matrix (ECM). IIFCs are expressing high levels of ECM genes, including Postn, Aspn and collagens (Col3a1, Col5a1, Col8a1, Col12a1) (Figure 3 – Supplementary Figures 1-2 and Fig. 7 – Supplementary Figure 1B). IIFCs are the cells expressing the highest levels of matrix-related genes compared to the other cell types in the fracture environment (i.e. immune cells, endothelial cells, Schwann cells, pericytes, …) as shown now in Fig. 7 – Supplementary Figure 1A. Therefore, IIFCs are the main contributors to the callus matrix.

      We investigated if IIFCs undergo apoptosis. We observed that only a low percentage of IIFCs express apoptosis-related genes and are positive for cleaved caspase 3 immunostaining at days 3, 5 and 7 of bone repair. This shows that IIFCs do not undergo apoptosis and reinforces our model in which IIFCs further differentiate into osteoblasts and chondrocytes. We added these data in Fig. 7 – Supplementary Figure 2 and added the sentence in the results section “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).” 

      (5) Results from the snRNA seq highlight the paracrine role of IIFC, and verification is needed to ensure that the effect this has on surrounding osteogenic lineages is not misinterpreted.  

      To assess cell-cell interactions, we used tools such as Connectome and CellChat to infer and quantify intercellular communication networks between cell types. Studies showed the robustness of these tools combined with in vivo validation (Sinha et al. 2022; Alečković et al. 2022; Li et al. 2023). Here we used these tools to illustrate the paracrine profile of IIFCs, but in vivo validation would be required using gene inactivation to assess the requirement of individual paracrine factors. We performed extensive analyses of the crosstalk between immune cells and SSPCs using our dataset in another study combined with in vivo validation, showing the robustness of the tool and the dataset (Hachemi et al. 2024). We adjusted our conclusions to reflect our analyses: “suggesting a crucial paracrine role of this transient IIFC population during fracture healing”, “suggesting their central role in mediating cell interactions after fracture”, “suggesting that SSPCs can receive signals from IIFC”. 

      References

      Aghajanian, Patrick, et Subburaman Mohan. 2018. “The Art of Building Bone: Emerging Role of Chondrocyte-to-Osteoblast Transdifferentiation in Endochondral Ossification“. Bone Research 6 (1): 19. https://doi.org/10.1038/s41413-018-0021-z.

      Alečković, Maša, Simona Cristea, Carlos R. Gil Del Alcazar, Pengze Yan, Lina Ding, Ethan D. Krop, Nicholas W. Harper, et al. 2022. “Breast Cancer Prevention by Short-Term Inhibition of TGFβ Signaling“. Nature Communications 13 (1): 7558. https://doi.org/10.1038/s41467-02235043-5.

      Ambrosi, Thomas H., Owen Marecic, Adrian McArdle, Rahul Sinha, Gunsagar S. Gulati, Xinming Tong, Yuting Wang, et al. 2021. “Aged Skeletal Stem Cells Generate an Inflammatory Degenerative Niche”. Nature 597 (7875): 256‑62. https://doi.org/10.1038/s41586-021-03795-7.

      Baccin, Chiara, Jude Al-Sabah, Lars Velten, Patrick M. Helbling, Florian Grünschläger, Pablo Hernández-Malmierca, César Nombela-Arrieta, Lars M. Steinmetz, Andreas Trumpp, et Simon Haas. 2020. “Combined Single-Cell and Spatial Transcriptomics Reveal the Molecular, Cellular and Spatial Bone Marrow Niche Organization”. Nature Cell Biology 22 (1): 38‑48. https://doi.org/10.1038/s41556-019-0439-6.

      Balemans, Wendy, et Wim Van Hul. 2007. “The Genetics of Low-Density Lipoprotein ReceptorRelated Protein 5 in Bone: A Story of Extremes”. Endocrinology 148 (6): 2622‑29. https://doi.org/10.1210/en.2006-1352.

      Brink, Susanne C van den, Fanny Sage, Ábel Vértesy, Bastiaan Spanjaard, Josi Peterson-Maduro, Chloé S Baron, Catherine Robin, et Alexander van Oudenaarden. 2017. “Single-Cell Sequencing Reveals Dissociation-Induced Gene Expression in Tissue Subpopulations”. Nature Methods 14 (10): 935‑36. https://doi.org/10.1038/nmeth.4437.

      Cao, Junjie, Yalin Wei, Jing Lian, Lunyun Yang, Xiaoyan Zhang, Jiaying Xie, Qiang Liu, Jinyong Luo, Baicheng He, et Min Tang. 2017. ”Notch Signaling Pathway Promotes Osteogenic Differentiation of Mesenchymal Stem Cells by Enhancing BMP9/Smad Signaling”. International Journal of Molecular Medicine 40 (2): 378‑88. https://doi.org/10.3892/ijmm.2017.3037.

      Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. ”The Single-Cell Transcriptional Landscape of Mammalian Organogenesis”. Nature 566 (7745): 496‑502. https://doi.org/10.1038/s41586-019-0969-x.

      Colnot, Céline. 2009. “Skeletal Cell Fate Decisions Within Periosteum and Bone Marrow During Bone Regeneration”. Journal of Bone and Mineral Research 24 (2): 274‑82. https://doi.org/10.1359/jbmr.081003.

      Debnath, Shawon, Alisha R. Yallowitz, Jason McCormick, Sarfaraz Lalani, Tuo Zhang, Ren Xu, Na Li, et al. 2018. “Discovery of a Periosteal Stem Cell Mediating Intramembranous Bone Formation”. Nature 562 (7725): 133‑39. https://doi.org/10.1038/s41586-018-0554-8.

      Ding, Jiarui, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, et al. 2020. “Systematic Comparison of Single-Cell and Single-Nucleus RNA-Sequencing Methods”. Nature Biotechnology 38 (6): 737‑46.

      https://doi.org/10.1038/s41587-020-0465-8.

      Dishowitz, Michael I., Shawn P. Terkhorn, Sandra A. Bostic, et Kurt D. Hankenson. 2012. “Notch Signaling Components Are Upregulated during Both Endochondral and Intramembranous Bone Regeneration”. Journal of Orthopaedic Research 30 (2): 296‑303. https://doi.org/10.1002/jor.21518.

      Duchamp de Lageneste, Oriane, Anaïs Julien, Rana Abou-Khalil, Giulia Frangi, Caroline Carvalho, Nicolas Cagnard, Corinne Cordier, Simon J. Conway, et Céline Colnot. 2018. “Periosteum Contains Skeletal Stem Cells with High Bone Regenerative Potential Controlled by Periostin”. Nature Communications 9 (1): 773. https://doi.org/10.1038/s41467-018-03124-z.

      Hsieh, Chen-Chan, B. Linju Yen, Chia-Chi Chang, Pei-Ju Hsu, Yu-Wei Lee, Men-Luh Yen, ShawFang Yet, et Linyi Chen. 2023. “Wnt Antagonism without TGFβ Induces Rapid MSC Chondrogenesis via Increasing AJ Interactions and Restricting Lineage Commitment”. iScience 26 (1): 105713. https://doi.org/10.1016/j.isci.2022.105713.

      Hu, Diane P., Federico Ferro, Frank Yang, Aaron J. Taylor, Wenhan Chang, Theodore Miclau, Ralph S. Marcucio, et Chelsea S. Bahney. 2017. “Cartilage to Bone Transformation during Fracture Healing Is Coordinated by the Invading Vasculature and Induction of the Core Pluripotency Genes”. Development 144 (2): 221‑34. https://doi.org/10.1242/dev.130807.

      Jeffery, Elise C., Terry L.A. Mann, Jade A. Pool, Zhiyu Zhao, et Sean J. Morrison. 2022. “Bone Marrow and Periosteal Skeletal Stem/Progenitor Cells Make Distinct Contributions to Bone Maintenance and Repair”. Cell Stem Cell 29 (11): 1547-1561.e6. https://doi.org/10.1016/j.stem.2022.10.002.

      Julien, Anais, Anuya Kanagalingam, Ester Martínez-Sarrà, Jérome Megret, Marine Luka, Mickaël Ménager, Frédéric Relaix, et Céline Colnot. 2021. “Direct contribution of skeletal muscle mesenchymal progenitors to bone repair”. Nature Communications 12 (1): 2860. https://doi.org/10.1038/s41467-021-22842-5.

      Julien, Anais, Simon Perrin, Oriane Duchamp de Lageneste, Caroline Carvalho, Morad Bensidhoum, Laurence Legeai-Mallet, et Céline Colnot. 2020. “FGFR3 in Periosteal Cells Drives Cartilage-to-Bone Transformation in Bone Repair”. Stem Cell Reports 15 (4): 955‑67. https://doi.org/10.1016/j.stemcr.2020.08.005.

      Julien, Anais, Simon Perrin, Ester Martínez-Sarrà, Anuya Kanagalingam, Caroline Carvalho, Marine Luka, Mickaël Ménager, et Céline Colnot. 2022. “Skeletal Stem/Progenitor Cells in Periosteum and Skeletal Muscle Share a Common Molecular Response to Bone Injury”. Journal of Bone and Mineral Research, juin, jbmr.4616. https://doi.org/10.1002/jbmr.4616.

      Kang, Sona, Christina N. Bennett, Isabelle Gerin, Lauren A. Rapp, Kurt D. Hankenson, et Ormond A. MacDougald. 2007. “Wnt Signaling Stimulates Osteoblastogenesis of Mesenchymal Precursors by Suppressing CCAAT/Enhancer-Binding Protein α and Peroxisome Proliferator Activated        Receptor γ”. Journal of Biological Chemistry 282 (19): 14515‑24. https://doi.org/10.1074/jbc.M700030200.

      Komatsu, David E., Michelle N. Mary, Robert Jason Schroeder, Alex G. Robling, Charles H. Turner, et Stuart J. Warden. 2010. “Modulation of Wnt Signaling Influences Fracture Repair”. Journal of Orthopaedic Research 28 (7): 928‑36. https://doi.org/10.1002/jor.21078.

      Hachemi, Yasmine, Simon Perrin, Maria Ethel, Anais Julien, Julia Vettese, Blandine Geisler, Christian Göritz, et Céline Colnot. 2024. “Multimodal Analyses of Immune Cells during Bone Repair Identify Macrophages as a Therapeutic Target in Musculoskeletal Trauma”. https://doi.org/10.1101/2024.04.29.591608.

      Kraus, Jessica M., Dion Giovannone, Renata Rydzik, Jeremy L. Balsbaugh, Isaac L. Moss, Jennifer L. Schwedler, Julien Y. Bertrand, et al. 2022. “Notch Signaling Enhances Bone Regeneration in the Zebrafish Mandible”. Development 149 (5): dev199995. https://doi.org/10.1242/dev.199995.

      Lee, S., L. H. Remark, A. M. Josephson, K. Leclerc, E. Muiños Lopez, D. J. Kirby, Devan Mehta, et al. 2021. “Notch-Wnt Signal Crosstalk Regulates Proliferation and Differentiation of Osteoprogenitor Cells during Intramembranous Bone Healing”. Npj Regenerative Medicine 6 (1): 29. https://doi.org/10.1038/s41536-021-00139-x.

      Li, Jiaoduan, Dongyan Cao, Lixin Jiang, Yiwen Zheng, Siyuan Shao, Ai Zhuang, et Dongxi Xiang. 2023. “ITGB2-ICAM1 Axis Promotes Liver Metastasis in BAP1-Mutated Uveal Melanoma with Retained Hypoxia and ECM Signatures”. Cellular Oncology (Dordrecht), décembre. https://doi.org/10.1007/s13402-023-00908-4.

      Logan, Malcolm, James F. Martin, Andras Nagy, Corrinne Lobe, Eric N. Olson, et Clifford J. Tabin. 2002. “Expression of Cre Recombinase in the Developing Mouse Limb Bud Driven by aPrxl Enhancer”. Genesis 33 (2): 77‑80. https://doi.org/10.1002/gene.10092.

      Machado, Léo, Perla Geara, Jordi Camps, Matthieu Dos Santos, Fatima Teixeira-Clerc, Jens Van Herck, Hugo Varet, et al. 2021.”Tissue Damage Induces a Conserved Stress Response That Initiates Quiescent Muscle Stem Cell Activation”. Cell Stem Cell 28 (6): 1125-1135.e7. https://doi.org/10.1016/j.stem.2021.01.017.

      Matsushita, Yuki, Mizuki Nagata, Kenneth M. Kozloff, Joshua D. Welch, Koji Mizuhashi, Nicha Tokavanich, Shawn A. Hallett, et al. 2020. “A Wnt-Mediated Transformation of the Bone Marrow Stromal Cell Identity Orchestrates Skeletal Regeneration”. Nature Communications 11 (1): 332. https://doi.org/10.1038/s41467-019-14029-w.

      Matthews, Brya G, Danka Grcevic, Liping Wang, Yusuke Hagiwara, Hrvoje Roguljic, Pujan Joshi, Dong-Guk Shin, Douglas J Adams, et Ivo Kalajzic. 2014. “Analysis of αSMA-Labeled Progenitor Cell Commitment Identifies Notch Signaling as an Important Pathway in Fracture Healing”. Journal of Bone and Mineral Research 29 (5): 1283‑94. https://doi.org/10.1002/jbmr.2140.

      Matthews, Brya G, Sanja Novak, Francesca V Sbrana, Jessica L Funnell, Ye Cao, Emma J Buckels, Danka Grcevic, et Ivo Kalajzic. 2021. “Heterogeneity of Murine Periosteum Progenitors Involved in Fracture Healing”. eLife 10 (février):e58534. https://doi.org/10.7554/eLife.58534.

      Minear, Steve, Philipp Leucht, Samara Miller, et Jill A Helms. 2010. “rBMP Represses Wnt Signaling and Influences Skeletal Progenitor Cell Fate Specification during Bone Repair”. Journal of Bone and Mineral Research 25 (6): 1196‑1207. https://doi.org/10.1002/jbmr.29.

      Minear, Steven, Philipp Leucht, Jie Jiang, Bo Liu, Arial Zeng, Christophe Fuerer, Roel Nusse, et Jill A. Helms. 2010. “Wnt Proteins Promote Bone Regeneration”. Science Translational Medicine 2 (29). https://doi.org/10.1126/scitranslmed.3000231.

      Novak, Sanja, Emilie Roeder, Benjamin P. Sinder, Douglas J. Adams, Chris W. Siebel, Danka Grcevic, Kurt D. Hankenson, Brya G. Matthews, et Ivo Kalajzic. 2020. “Modulation of Notch1 Signaling Regulates Bone Fracture Healing”. Journal of Orthopaedic Research 38 (11): 2350‑61. https://doi.org/10.1002/jor.24650.

      Pinzone, Joseph J., Brett M. Hall, Nanda K. Thudi, Martin Vonau, Ya-Wei Qiang, Thomas J. Rosol, et John D. Shaughnessy. 2009. “The Role of Dickkopf-1 in Bone Development, Homeostasis, and Disease”. Blood 113 (3): 517‑25. https://doi.org/10.1182/blood-2008-03-145169.

      Remark, Lindsey H., Kevin Leclerc, Malissa Ramsukh, Ziyan Lin, Sooyeon Lee, Backialakshmi Dharmalingam, Lauren Gillinov, et al. 2023. “Loss of Notch Signaling in Skeletal Stem Cells Enhances Bone Formation with Aging”. Bone Research 11 (1): 50. https://doi.org/10.1038/s41413-023-00283-8.

      Ruscitto, Angela, Peng Chen, Ikue Tosa, Ziyi Wang, Gan Zhou, Ingrid Safina, Ran Wei, et al. 2023. “Lgr5-Expressing Secretory Cells Form a Wnt Inhibitory Niche in Cartilage Critical for Chondrocyte Identity”. Cell Stem Cell 30 (9): 1179-1198.e7. https://doi.org/10.1016/j.stem.2023.08.004.

      Selewa, Alan, Ryan Dohn, Heather Eckart, Stephanie Lozano, Bingqing Xie, Eric Gauchat, Reem Elorbany, et al. 2020. “Systematic Comparison of High-Throughput Single-Cell and SingleNucleus Transcriptomes during Cardiomyocyte Differentiation”. Scientific Reports 10 (1): 1535. https://doi.org/10.1038/s41598-020-58327-6.

      Sinha, Sarthak, Holly D. Sparks, Elodie Labit, Hayley N. Robbins, Kevin Gowing, Arzina Jaffer, Eren Kutluberk, et al. 2022. “Fibroblast Inflammatory Priming Determines Regenerative versus Fibrotic Skin Repair in Reindeer”. Cell 185 (25): 4717-4736.e25. https://doi.org/10.1016/j.cell.2022.11.004.

      Wang, Cuicui, Jason A. Inzana, Anthony J. Mirando, Yinshi Ren, Zhaoyang Liu, Jie Shen, Regis J. O’Keefe, Hani A. Awad, et Matthew J. Hilton. 2016. “NOTCH Signaling in Skeletal Progenitors Is Critical for Fracture Repair”. The Journal of Clinical Investigation 126 (4): 1471‑81. https://doi.org/10.1172/JCI80672.

      Wen, Fei, Xiaojie Tang, Lin Xu, et Haixia Qu. 2022. “Comparison of Single‑nucleus and Single‑cell Transcriptomes in Hepatocellular Carcinoma Tissue”. Molecular Medicine Reports 26 (5): 339. https://doi.org/10.3892/mmr.2022.12855.

      Wu, Chia-Lung, Amanda Dicks, Nancy Steward, Ruhang Tang, Dakota B. Katz, Yun-Rak Choi, et Farshid Guilak. 2021. “Single Cell Transcriptomic Analysis of Human Pluripotent Stem Cell Chondrogenesis”. Nature Communications 12 (1): 362. https://doi.org/10.1038/s41467-02020598-y.

      Wu, Haojia, Yuhei Kirita, Erinn L. Donnelly, et Benjamin D. Humphreys. 2019. “Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis”. Journal of the American Society of Nephrology 30 (1): 23‑32. https://doi.org/10.1681/ASN.2018090912.

      Zhong, Leilei, Lutian Yao, Robert J. Tower, Yulong Wei, Zhen Miao, Jihwan Park, Rojesh Shrestha, et al. 2020. “Single Cell Transcriptomics Identifies a Unique Adipose Lineage Cell Population That Regulates Bone Marrow Environment”. eLife 9 (avril):e54695. https://doi.org/10.7554/eLife.54695.

      Zhou, Xin, Klaus von der Mark, Stephen Henry, William Norton, Henry Adams, et Benoit de Crombrugghe. 2014. “Chondrocytes Transdifferentiate into Osteoblasts in Endochondral Bone during Development, Postnatal Growth and Fracture Healing in Mice”. Édité par Matthew L. Warman. PLoS Genetics 10 (12): e1004820. https://doi.org/10.1371/journal.pgen.1004820.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and we will review our primary sources to clarify the trait classifications. We will also reclassify the species according to the expertise of this reviewer and perform our analysis again. 

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we will add a reference to the methods section to clarify this.

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We will replace the terms specialist and generalist with specific predictions based on traits.

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We will review this text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers.

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We will revise the discussion to acknowledge potential differences in outcomes.

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. 

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We will revise the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them. We will carefully review all figures and captions, and we will make changes to improve the clarity of the text and the presentation of results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the single-cell growth rate continuously decreases, in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation, in turn, is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus translation. The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth. 

      Thank you for the excellent summary of our work.

      Strengths: 

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data. While the model fits well with the continuous reduction of relative growth rate in rich medium (M9gluCAA), the behavior in minimal media without casamino acids is a bit less clear (see comments below). 

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth. However, I still have a few points about the interpretation of the data and the measured fractions of active ribosomes (see below). 

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing. 

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration. 

      Weaknesses: 

      There are a range of points that should be clarified or addressed, either by additional experiments/analyses or by explanations or clear disclaimers. 

      First, the continuous reduction of growth rate upon arrest of DNA replication initiation observed in rich growth medium (M9gluCAA) is not equally observed in poor media. Instead, the relative growth rate is immediately/quickly reduced by about 10-20% and then maintained for long times, as if the arrest of replication initiation had an immediate effect but would then not lead to saturation of the DNA substrate. In particular, the long plateau of a constant relative growth rate in M9ala is difficult to reconcile with the model fit in Fig 4S2. Is it possible that DNA is not limiting in poor media (at least not for the cell sizes studied here) while replication arrest still elicits a reduction of growth rate in a different way? Might this have something to do with the naturally much higher oscillations of DNA concentration in minimal medium?

      The reviewer is correct that there are interesting differences between nutrient-rich and -poor conditions. They were originally noted in the discussion, but we understand how our original presentation made it confusing. We reorganized the text and figures to better explain our results and interpretations. In the revised manuscript, the data related to the poor media are now presented separately (new Figure 6) from the data related to the rich medium (Figures 1-3).  The total RNAP activity (abundance x active fraction) is significantly reduced in poor media (Figure 6A-B) similarly to rich medium (Figure 3H). Thus, DNA is limiting for transcription across conditions. However, the total ribosome activity in poor media (Figure 6C-D) and thus the growth rate (Figure 6EF) was less affected in comparison to rich media (Figure 2H and 1C). Our interpretation of these results is that while DNA is limiting for transcription in all tested nutrient conditions (as shown by the total active RNAP data), post-transcriptional buffering activities compensate for the reduction in transcription in poor media, thereby maintaining a better scaling of growth rates under DNA limitation. 

      The authors argue that DNA becomes limiting in the range of physiological cell sizes, in particular for M9glCAA (Fig. 1BC). It would be helpful to know by how much (fold-change) the DNA concentration is reduced below wild-type (or multi-N) levels at t=0 in Fig 1B and how DNA concentration decays with time or cell area, to get a sense by how many-fold DNA is essentially 'overexpressed/overprovided' in wild-type cells. 

      We now provide crude estimates in the Discussion section. The revised text reads: “Crude estimations suggest that ≤ 40% DNA dilution is sufficient to negatively affect transcription (total RNAP activity) in M9glyCAAT, whereas the same effect was observed after less than 10% dilution in nutrient-poor media (M9gly or M9ala) (see Materials and Methods).” We obtained these numbers based on calculations and estimates described in the Materials and Methods section and Appendix 1 (Appendix 1 – Table 1).

      Fig. 2: The distribution of diffusion coefficients of RpsB is fit to Gaussians on the log scale. Is this based on a model or on previous work or simply an empirical fit to the data? An exact analytical model for the distribution of diffusion constants can be found in the tool anaDDA by Vink, ..., Hohlbein Biophys J 2020. Alternatively, distributions of displacements are expressed analytically in other tools (e.g., in SpotOn). 

      We use an empirical fit of Gaussian mixture model (GMM) of three states to the data and extract the fractions of molecules in each state. This avoids making too many assumptions on the underlying processes, e.g. a Markovian system with Brownian diffusion. The model in anaDDA (Vink et al.) is currently limited to two-transitioning states with a maximal step number of 8 steps per track for a computationally efficient solution (longer tracks are truncated). Using a short subset of the trajectories is less accurate than using the entire trajectory and because of this, we consider full tracks with at least 9 displacements. Meanwhile, Spot-On supports a three-state model but it is still based on a semi-analytical model with a pre-calculated library of parameters created by fitting of simulated data. Neither of these models considers the effect of cell confinement, which plays a major role in single-molecule diffusion in small-sized cells such as bacteria. For these reasons, we opted to use an empirical fit to the data. We note that the fractions of active ribosomes in WT cells, which we extracted from these diffusion measurements, are consistent with the range of estimates obtained by others using similar or different approaches (Forchhammer and Lindhal 1971; Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). 

      The estimated fraction of active ribosomes in wild-type cells shows a very strong reduction with decreasing growth rate (down from 75% to 30%), twice as strong as measured in bulk experiments (Dai et al Nat Microbiology 2016; decrease from 90% to 60% for the same growth rate range) and probably incompatible with measurements of growth rate, ribosome concentrations, and almost constant translation elongation rate in this regime of growth rates. Might the different diffusive fractions of RpsB not represent active/inactive ribosomes? See also the problem of quantification above. The authors should explain and compare their results to previous work. 

      We agree that our measured range is somewhat larger than the estimated range from Dai et al, 2016. However, they use different media, strains, and growth conditions. We also note that Dai et al did not make actual measurements of the active ribosome fraction. Instead, they calculate the “active ribosome equivalent” based on a model that includes growth rate, protein synthesis rate, RNA/protein abundance, and the total number of amino acids in all proteins in the cell. Importantly, our measurements show the same overall trend (a ~30% decrease) as Dai et al, 2016. Furthermore, our results are within the range of previous experimental estimates from ribosome profiling (Forchhammer and Lindhal 1971) or single-ribosome tracking (Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). We clarified this point in the revised manuscript. 

      To measure the reduction of mRNA transcripts in the cell, the authors rely on the fluorescent dye SYTO RNAselect. They argue that 70% of the dye signal represents mRNA. The argument is based on the previously observed reduction of the total signal by 70% upon treatment with rifampicin, an RNA polymerase inhibitor (Bakshi et al 2014). The idea here is presumably that mRNA should undergo rapid degradation upon rif treatment while rRNA or tRNA are stable. However, work from Hamouche et al. RNA (2021) 27:946 demonstrates that rifampicin treatment also leads to a rapid degradation of rRNA. Furthermore, the timescale of fluorescent-signal decay in the paper by Bakshi et al. (half life about 10min) is not compatible with the previously reported rapid decay of mRNA (24min) but rather compatible with the slower, still somewhat rapid, decay of rRNA reported by Hamouche et al.. A bulk method to measure total mRNA as in the cited Balakrishnan et al. (Science 2022) would thus be a preferred method to quantify mRNA. Alternatively, the authors could also test whether the mass contribution of total RNA remains constant, which would suggest that rRNA decay does not contribute to signal loss. However, since rRNA dominates total RNA, this measurement requires high accuracy. The authors might thus tone down their conclusions on mRNA concentration changes while still highlighting the compelling data on RNAp diffusion. 

      Thank you for bringing the Hamouche et al 2021 paper to our attention. To address this potential issue, we have performed fluorescence in situ hybridization (FISH) microscopy using a 16S rRNA probe (EUB338) to quantify rRNA concentration in 1N cells. We found that the rRNA signal only slightly decreases with cell size (i.e., genome dilution) compared to the RNASelect signal (e.g., a ~5% decrease for rRNA signal vs. 50% for RNASelect for a cell size range of 4 to 10 µm2). We have revised the text and added a figure to include the new rRNA FISH data (Figure 4). In addition, as a control, we validated our rRNA FISH method by comparing the intracellular concentration of 16S rRNA in poor vs. rich media (new Figure 4 – Figure supplement 3).

      The proteomics experiments are a great addition to the single-cell studies, and the correlations between distance from ori and protein abundance is compelling. However, I was missing a different test, the authors might have already done but not put in the manuscript: If DNA is indeed limiting the initiation of transcription, genes that are already highly transcribed in non-perturbed conditions might saturate fastest upon replication inhibition, while genes rarely transcribed should have no problem to accommodate additional RNA polymerases. One might thus want to test, whether the (unperturbed) transcription initiation rate is a predictor of changes in protein composition. This is just a suggestion the authors may also ignore, but since it is an easy analysis, I chose to mention it here. 

      We did not find any correlation when we examined the potential relation between RNA slopes and mRNA abundance (from our first CRISPRi oriC time point) or the transcription initiation rate (from Balakrishnan et al., 2022, PMID: 36480614) across genes. These new plots are presented in Figure 7 – Figure supplement 2B. In contrast, we found a small but significant correlation between RNA slopes and mRNA decay rates (from Balakrishnan et al., 2022, PMID: 36480614), specifically for genes with short mRNA lifetimes (new Figure 7F). This effect is consistent with our model prediction (Figure 5 – Figure supplement 2). 

      Related to the proteomics, in l. 380 the authors write that the reduced expression close to the ori might reflect a gene-dosage compensatory mechanism. I don't understand this argument. Can the authors add a sentence to explain their hypothesis? 

      We apologize for the confusion. While performing additional analyses for the revisions, we realized that while the proteins encoded by genes close to oriC tend to display subscaling behavior, this is not true at the mRNA level (new Figure 7 – Figure supplement 3B). In light of this result, we no longer have a hypothesis for the observed negative correlation at the protein level (originally Figure 5D, now Figure 7 – Figure supplement 3A). The text was revised accordingly.  

      In Fig. 1E the authors show evidence that growth rate increases with cell length/area. While this is not a main point of the paper it might be cited by others in the future. There are two possible artifacts that could influence this experiment: a) segmentation: an overestimation of the physical length of the cell based on phase-contrast images (e.g., 200 nm would cause a 10% error in the relative rate of 2 um cells, but not of longer cells). b) timedependent changes of growth rate, e.g., due to change from liquid to solid or other perturbations. To test for the latter, one could measure growth rate as a function of time, restricting the analysis to short or long cells, or measuring growth rate for short/long cells at selected time points. For the former, I recommend comparison of phase-contrast segmentation with FM4-64-stained cell boundaries.

      As the reviewer notes, the small increase in relative growth was just a minor observation that does not affect our story whether it is biologically meaningful or the result of a technical artefact. But we agree with the reviewer that others might cite it in future works and thus should be interpreted with caution.

      An artefact associated with time-dependent changes (e.g. changing from liquid cultures to more solid agarose pads) is unlikely for two reasons. 1. We show that varying the time that cells spend on agarose pads relative to liquid cultures does not affect the cell size-dependent growth rate results (Figure 1 – supplement 5A). 2. We show that the growth rate is stable from the beginning of the time-lapse with no transient effects upon cell placement on agarose pads for imaging (Figure 1 – supplement 1). These results were described in the Methods section where they could easily be missed. We revised the text to discuss these controls more prominently in the Results section.

      As for cell segmentation, we have run simulations and agree with the reviewer that a small overestimation of cell area (which is possible with any cell segmentation methods including ours) could lead to a small increase in relative growth with increasing cell areas (new Figure 1 – Figure supplement 3). Since the finding is not important to our story, we simply revised the text and added the simulation results to alert the readers to the possibility that the observation may be due to a small cell segmentation bias.

      Reviewer #2 (Public Review): 

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-to-volume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels. 

      Thank you for the support and accurate summary!

      Reviewer #3 (Public Review): 

      Summary: 

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth.

      Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor. 

      Strengths: 

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be. 

      Thank you!

      Weaknesses: 

      One limitation of the study is that genome concentration is largely treated as a single commodity. While this facilitates their modeling approach, one would expect that the growth phenotypes observed arise due to copy number limitation in a relatively small number of rate-limiting genes. The authors do report shifts in the composition of both the proteome and the transcriptome in response to replication inhibition, but while they report a positional effect of distance from the replication origin (reflecting loss of high-copy, origin-proximal genes), other factors shaping compositional shifts and their functional effects on growth are not extensively explored. This is particularly true for ribosomal RNA itself, which the authors assume to grow proportionately with protein. More generally, understanding which genes exert the greatest copy number-dependent influence on growth may aid both efforts to enhance (biotechnology) and inhibit (infection) bacterial growth. 

      We agree but feel that identifying the specific limiting genes is beyond the scope of the study. This said, we carried out additional experiments and analyses to address the reviewer’s comment and identify potential contributing factors and limiting gene candidates. First, we examined the intracellular concentration of 16S ribosomal RNA (rRNA) by rRNA FISH microscopy and found that it decays much slower than the bulk of mRNAs as measured using RNASelect staining (new Figure 4 and Figure 4 – Figure supplements 1 and 3). We found that the rRNA signal is far more stable in 1N cells than the RNASelect signal, the former decreasing by only ~5% versus ~50% for the later in response to the same range of genome dilution (Figure 4C).  Second,  we carried out new correlation analyses between our proteomic/transcriptomic datasets and published genome-wide datasets that report various variables under unperturbed conditions (e.g., mRNA abundance, mRNA degradation rates, fitness cost, transcription initiation rates, essentiality for viability); see new Figure 7E-G and Figure 7 – Figure supplement 2. In the process, we found that genes essential for viability tend, on average, to display superscaling behavior (Figure 7G). This suggests that cells have evolved mechanisms that prioritize expression of essential genes over nonessential ones during DNA-limited growth. Furthermore, this analysis identified a small number of essential genes that display strong negative RNA slopes (Figure 7C, Datasets 1 and 2), indicating that the concentration of their mRNA decreases rapidly relative to the rest of the transcriptome upon genome dilution. These essential genes with strong subscaling behavior are candidates for being growth-limiting. 

      The text and figures were revised to include these new results.

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth. 

      Thank you!

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Below are my comments. 

      (1) I noticed that a paper by Li et al. on biorxiv has found similar results as this work ("Scaling between DNA and cell size governs bacterial growth homeostasis and resource allocation," https://doi.org/10.1101/2021.11.12.468234), including the linear growth of E. coli when the DNA concentration is low. This relevant reference was not cited or discussed in the current manuscript. 

      We agree that authors should cite and discuss relevant peer-reviewed literature. But broadly speaking, we feel that extending this responsibility to all preprints (and by extension any online material) that have not been reviewed is a bit dangerous. It would effectively legitimize unreviewed claims and risk their propagation in future publications. We think that while imperfect, the peer-reviewing process still plays an important role. 

      Regarding the specific 2021 preprint that the reviewer pointed out, we think that the presented growth rate data are quite noisy and that the experiments lack a critical control (multi-N cells), making interpretation difficult. Their report that plasmid-borne expression is enhanced when DNA is severely diluted is certainly interesting and makes sense in light of our measurements that the activities, but not the concentrations, of RNA polymerases and ribosomes are reduced in 1N cells. However, we do not know why this preprint has not yet been published since 2021. There could be many possible reasons for this. Therefore, we feel that it is safer to limit our discussion to peer-reviewed literature.

      (2) I think the kinetic Model B in the Appendix has been studied in previous works, such as Klump & Hwa, PNAS 2008, https://doi.org/10.1073/pnas.0804953105

      Indeed, Klumpp & Hwa 2008 modeled the kinetics of RNA polymerase and promoter association prior to our study. But there is a difference between their model and ours. Their model is based on Michaelis Menten-type (MM) functions in which the RNAP is analogous to the “substrate” and the promoter to the “enzyme” in the MM equation. In contrast, our model uses functions based on the law of mass action (instead of MMtype of function). We have revised the text, included the Klumpp & Hwa 2008 reference, and revised the Materials & Methods section to clarify these points. 

      (3) On lines 284-285, if I understand correctly, the fractions of active RNAPs and active ribosomes are relative to the total protein number. It would be helpful if the authors could mention this explicitly to avoid confusion. 

      The fractions of active RNAPs and active ribosomes are expressed as the percentage of the total RNAPs and ribosomes. We have revised the text to be more explicit. Thank you.

      (4) On line 835, I am not sure what the bulk transcription/translation rate means. I guess it is the maximum transcription/translation rate if all RNAPs/ribosomes are working according to Eq. (1,2). It would be helpful if the authors could explain the meaning of r_1 and r_2 more explicitly. 

      Our apology for the lack of clarity. We have added the following equations:

      (5) Regarding the changes in protein concentrations due to genome dilution, a recent theoretical paper showed that it may come from the heterogeneity in promoter strengths (Wang & Lin, Nature Communications 2021). 

      In the Wang and Lin model, the heterogeneity in promoter strength predicts that the “mRNA production rate equivalent”, which is the mRNA abundance multiplied by the mRNA decay rate, will correlate the RNA slopes. However, we found these two variables to be uncorrelated (see below, The Spearman correlation coefficient ρ was 0.02 with a p-value of 0.24, indicating non-significance (NS).

      Author response image 1.

      The mRNA production rate equivalent (mRNA abundance at the first time point after CRISPRi oriC induction multiplied by the mRNA degradation rate measured by Balakrishnan et al., 2022, PMID: 36480614, expressed in transcript counts per minute) does not correlate (Spearman correlation’s p-value = 0.24) with the RNA slope in 1N-rich cells.  Data from 2570 genes are shown (grey markers, Gaussian kernel density estimation - KDE), and their binned statistics (mean +/- SEM, ~280 genes per bin, orange markers). 

      In addition, we found no significant correlation between RNA slopes and mRNA abundance or transcription initiation rate. These plots are now included in Figure 7E and Figure 7 –Figure supplement 2B. Thus, the promoter strength does not appear to be a predictor of the RNA (and protein) scaling behavior under DNA limitation. 

      Reviewer #3 (Recommendations For The Authors): 

      One general area that could be developed further is analysis of changes in the proteome/transcriptome composition, given that there may be specific clues here as to the phenotypic effects of genome concentration limitation. Specifically: 

      • In Figure 5D, the authors demonstrate an effect of origin distance on sensitivity to replication inhibition, presumably as a copy number effect. However, the authors note that the effect was only slight and postulated a compensatory mechanism. Due to the stability of proteins, one should expect relatively small effects - even if synthesis of a protein stopped completely, its concentration would only decrease twofold with a doubling of cell area (slope = -1, if I'm interpreting things correctly). It would be helpful to display the same information shown in Figure 5D at the mRNA level, since I would anticipate that higher mRNA turnover rates mean that effects on transcription rate should be felt more rapidly. 

      We thank the reviewer for this suggestion. To our surprise, we found that there is no correlation between gene location relative to the origin and RNA slope across genes. This suggests that the observed correlation between gene location and protein slopes does not occur at the mRNA level. Given that we do not have an explanation for the underlying mechanism, we decided to present these data (the original data in Figure 5D and the new data for the RNA slope) in a supplementary figure (Figure 7 – Figure supplement 3).

      • Related to this, did the authors see any other general trends? For example, do highly expressed genes hit saturation faster, making them more sensitive to limited genome concentration? 

      We found that the RNA slopes do not correlate with mRNA abundance or transcription initiation rates. However, they do correlate with mRNA decay. That is, short-lived mRNAs tend to have negative RNA slopes. The new analyses have been added as Figure 7E-F and Figure 7 – Figure supplement 2B. The text has been revised to incorporate this information. 

      • Presumably loss of growth is primarily driven by a subset of genes whose copy number becomes limiting. Previously, it has been reported that there is a wide variety among "essential" genes in their expression-fitness relationship - i.e. how much of a reduction in expression you need before growth is reduced (e.g. PMID 33080209). It would be interesting to explore the shifts in proteome/transcriptome composition to see whether any genes particularly affected by restricted genome concentration are also especially sensitive to reduced expression - overlap in these datasets may reveal which genes drive the loss of growth. 

      This is a very interesting idea – thank you! We did not find a correlation between the protein/RNA slope and the relative gene fitness as previously calculated (PMID 33080209), as shown below.

      Author response image 2.

      The relative fitness of each gene (data by Hawkins et al., 2020, PMID: 33080209, median fitness from the highest sgRNA activity bin) plotted versus the gene-specific RNA and protein slopes that we measured in 1Nrich cells after CRISPRi oriC induction. More than 260 essential genes are shown (262 RNA slopes and 270 protein slopes, grey markers), and their binned statistics (mean +/- SEM, 43-45 essential genes per bin, orange markers). The spearman correlations (ρ) with p-values above 10-3 are considered not significant (NS). In our analyses, we only considered correlations significant if they have a Spearman correlation p-value below 10-10.

      However, while doing this suggested analysis, we noticed that the essential genes that were included in the forementioned study have RNA slopes above zero on average. This led us to compare the RNA slope distributions of essential genes relative to all genes (now included in Figure 7G). We found that they tend to display superscaling behavior (positive RNA slopes), suggesting the existence of regulatory mechanisms that prioritize the expression of essential genes over less important ones when genome concentration becomes limiting for growth.  The text has been revised to include this new information.

      Other suggestions: 

      • In Figure 3 the authors report that total RNAP concentration increases with increasing cytoplasmic volume. This is in itself an interesting finding as it may imply a compensatory mechanism - can the authors offer an explanation for this? 

      We do not have a straightforward explanation. But we agree that it is very interesting and should be investigated in future studies given that this superscaling behavior is common among essential genes. 

      • The explanation of the modeling within the main text could be improved. Specifically, equations 1 and 2, as well as a discussion of models A and B (lines 290-301), do not explicitly relate DNA concentration to downstream effects. The authors provide the key information in Appendix 1, but for a general reader, it would be helpful to provide some intuition within the main text about how genome concentration influences transcription rate (i.e. via 𝛼RNAP).  

      We apologize for the lack of clarity. We have added information that hopefully improves clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It is not clear from the methods section how free FAs were applied to membrane preparations or HEK293 cells. Were FAs solubilized in organic solvents, or introduced as micelles?

      The requested info is inserted into the M&M section

      Could the authors comment on what is known about the concentration of oleic acid and other non-saturated fatty acids in plasma membranes relative to those required to produce allosteric effects on cyclase activity?

      This info is now included in the last paragraph of the discussion.

      It would be worthwhile to test the effect of FAs on basal (not Gαs-stimulated) activity of mACs.

      This has been carried with mAC isoforms 2, 3, 7, and 9 in which oleic acid enhances Gsα-stimulated activity. Due to the low levels of basal activities interpretable data were not obtained.

      Do triglycerides esterified with oleic acid stimulate mAC3 and other sensitive isoforms?

      Experiments were done with triolein and 2-oleoyl-glycerol (the answer is no). The data are presented in Fig. 3 and in the appendix Fig.’s 8, 9, 14; structural formulas in appendix 2 Fig. 4 were updated.

      Does the quantity plotted on the vertical axis of Figure 1, right panel represent "Fractional Stimulation by Oleic acid" rather than simply "Fold Stimulation"? Clearly, as shown in the two left-most panels, Gαs stimulates both mAC and mAC5. Rather it seems that the ratio (oleic acid stimulation) / (Gαs stimulation) remains constant. This observation supports the statement in the discussion that "We suppose that in mAC3 the equilibrium of two differing ground states favors a Gαs-unresponsive state and the effector oleic acid concentration-dependently shifts this equilibrium to a Gαs-responsive state". It could also be said that the effect of oleic acid is additive, and in constant proportion to that of Gαs.

      This comment certainly is related to Fig. 2:

      The ratio would be (Gsα + oleic acid stimulation) / (Gsα-stimulation), i.e., fractional stimulation by addition of oleic acid is identical to fold stimulation.

      We have amended the legend to fig. 2C for clarification.

      The last sentence is wrong because oleic acid alone does not stimulate.

      It is stated on page 3, 2nd to last line that "The action of oleic acid on mAC3 was instantaneous...". Since the earliest time point is taken at 5 minutes, the claim that the action of the lipid is instantaneous cannot be made. Information about kinetics would be useful to have, since it is possible that the lipid must be released from a micelle and be incorporated into the AC membrane fraction before it is active.

      The first point is 3 min.

      We deleted the word “instantaneous” and added the correlation coefficients for both conditions in the legend to appendix 2; fig. 1 for clarification.

      The data spread in Figure 4 and other figures showing similar data is significant, to the extent that the computed value for EC50 may not be of high precision. Authors should cite the correlation coefficient for the overall fit and uncertainty for the EC50 value (in addition to significances by t-test of individual data points).

      This will not add valuable information. Pearsons correlation coefficients are only for linear relationships.

      (cf. N.N. Kachouie, W. Deebani (2020) Association Factor for Identifying Linear and Nonlinear Correlations in Noisy Conditions. Entropy 22:440)

      The "switch" between relatively low potency and high efficacy in membrane preps to high potency and low efficacy in cells is remarkable. Could this have a methodological basis or is it reflective of the mechanism by which FAs access mACs in membrane preps vs. cell membranes, or perhaps some biochemical transformation of the lipid in cells?

      Honestly, we do not know.

      The authors should note that there is some precedence for this work:

      J Nakamura , N Okamura, S Usuki, S Bannai, Inhibition of adenylyl cyclase activity in brain membrane fractions by arachidonic acid and related unsaturated fatty acids. Arch Biochem Biophys. 2001 May 1;389(1):68-76. doi: 10.1006/abbi.2001.2315.

      The effects of FA deficiencies on AC and related activities have been noted:

      Alam SQ, Mannino SJ, Alam BS, McDonough K Effect of essential fatty acid deficiency on forskolin binding sites, adenylate cyclase, and cyclic AMP-dependent protein kinase activity, the levels of G proteins and ventricular function in rat heart. J Mol Cell Cardiol. 1995 Aug;27(8):1593-604. doi: 10.1016/s0022-2828(95)90491-3. PMID: 8523422

      The latter publications are supportive of, and provide context to, the author's findings.

      Both references are mentioned and cited.

      Minor points:

      The significance of the coloring scheme in Figure 5C bar graph should be stated in the legend.

      Done.

      In the introduction, it is stated that "The protein displayed two similar catalytic domains (C1 and C2) and two dissimilar hexahelical membrane anchors (TM1 and TM2)". In both cases, the respective domains can be said to be similar in overall fold, but - certainly in the case of the catalytic domains - different in amino acid sequence in functionally important regions of the domain.

      Done: Changed wording.

      The statement in the introduction that "The domain architecture, TM1-C1-TM2-C2, clearly indicated a pseudoheterodimeric protein composed of two concatenated bacterial precursor proteins" The authors refer to the fact that mammalian enzymes are pseudo heterodimers whereas bacterial type III cyclases are dimers of identical subunits.

      Done.

      Reviewer #2 (Recommendations for the authors):

      The title need not state that a 'new class of receptors' has been identified. There is no direct evidence that the lipids bind to the enzymes, and the affinities can only be surmised from the EC50 graphs. To call a protein a receptor requires evidence to show that the binding is specific by showing that binding can be inhibited by a large excess of 'unlabelled' ligand. This could have been done by procuring labelled lipids for experimental verification.

      As is well known, lipids easily bind to proteins. In this study no purified proteins were used. Therefore, binding assays most likely would result in unreliable data.

      The paper would have benefitted from showing sequence alignments in the TM domains of the ACs discussed in the paper. Further, a phylogenetic tree of mammalian ACs would also reveal which enzymes from other species may be regulated similarly to those described in the paper. This would be important for researchers who use other model organisms to study cAMP signalling.

      Such data are in multiple papers accessible in the literature. Where deemed appropriate we inserted references.

      Figures 1A and 1B show data from only two experiments. A third experiment would have been useful in order to show the statistical significance of the data.

      At this stage more experiments would not have affected further experimental plans.

      Statements made in the text (for example, the last paragraph on page 6) state only the mean value and not the SDs. This would have been important to include even if the data is shown in the appendix. The same is true in the Legend of Figure 2. Why have the authors decided to use SEM and not SDs?

      The reason is specified in M&M.

      Concentrations of lipids used in biochemical assays are in the micromolar range. This suggests that we have moderate affinity binding, more in the range of an enzyme for a substrate rather than a receptor-ligand interaction.

      We happen to disagree. Clearly, the differential activities, enhancing or attenuating Gsα-stimulated mAC activities is most plausibly explained by mAC receptor properties. mACs have enzyme activities using fatty acids as substrates.

      The authors add lipids to cells and show changes in cAMP levels in their presence and absence. They also discuss how these extracellular lipids could be produced. Do you think this is necessary in vivo, though? Could the lipids present in membranes naturally act as regulators? Do specific lipid concentrations differ in different cell types, suggesting tissue-specific regulation of these mammalian Acs?

      These are things that could be discussed in the manuscript.

      The last paragraph of the discussion deals with these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The extra macrochaetae (emc) gene encodes the only Inhibitor of DNA binding protein (Id protein) in Drosophila. Its best-known function is to inhibit proneural genes during development. However, the emc mutants also display nonproneural phenotypes. In this manuscript, the authors examined four non-proneural phenotypes of the emc mutants and reported that they are all caused by inappropriate non-apoptotic caspase activity. These non-neuronal phenotypes are: reduced growth of imaginal discs, increased speed of the morphogenetic furrow, and failure to specify R7 photoreceptor neurons and cone cells during eye development. Double mutants between emc and either H99 (which deletes the three pro-apoptotic genes reaper, grim, and hid) or the initiator caspase dronc suppress these mutant phenotypes of emc suggesting that the cell death pathway and caspase activity are mediating these emc phenotypes. In previous work, the authors have shown that emc mutations elevate the expression of ex which activates the SHW pathway (aka the Hippo pathway). One known function of the SHW pathway is to inhibit Yorkie which controls the transcription of the inhibitor of apoptosis, Diap1. Consistently, in emc clones the levels of Diap1 protein are reduced which might explain why caspase activity is increased in emc clones giving rise to the four non-neural phenotypes of emc mutants.

      However, this increased caspase activity is not causing ectopic apoptosis, hence the authors propose that this is nonapoptotic caspase activity. In the last part of the manuscript, the authors ruled out that Wg, Dpp, and Hh signaling are the target of caspases, but instead identified Notch signaling as the target of caspases, specifically the Notch ligand Delta. Protein levels of Delta are increased in emc clones in an H99- and dronc-dependent manner. The authors conclude that caspase-dependent non-apoptotic signaling underlies multiple roles of emc that are independent of proneural bHLH proteins.

      Strengths:

      Overall, this is an interesting manuscript and the findings are intriguing. It adds to the growing number of non-apoptotic functions of apoptotic proteins and caspases in particular. The manuscript is well written and the data are usually convincingly presented.

      Weaknesses:

      (1)  One major concern I have is the observation by the authors in Figure 3C in which protein levels of Diap1 are still reduced in emc H99 double mutant clones. If Diap1 is still reduced in these clones, shouldn't caspases still be derepressed? Given that emc H99 double mutants rescue all emc phenotypes examined, the observation that Diap1 levels are still reduced in emc H99 clones is inconsistent with the authors' model. The authors need to address this inconsistency.

      The effect of H99 emc clones on Diap1 protein levels is consistent with our conclusions.  The reviewer’s concern probably relates to previous work that shows that RHG proteins act by antagonizing DIAP1, so that Diap1 is epistatic to RHG (PMID:10481910), and that RHG proteins affect DIAP1 protein levels, and in particular that HID promotes DIAP1 ubiquitylation leading to its destruction (PMID:12021767).  First, epistasis means that in the absence of DIAP1, RHG levels do not affect cell survival.  DIAP1 protein is not absent in emc/emc eye clones, however, it is reduced.  It is not only possible but expected that RHG levels would affect survival when DIAP1 levels are only reduced.  Secondly, we did not see a difference in DIAP1 levels between H99/H99 clones and H99/+ cells within the same specimen, suggesting that rpr, grim and hid might not affect DIAP1 levels. It is possible that Hid protein only affects DIAP1 levels when overexpressed, as in the aforementioned paper (PMID:12021767), and that physiological RHG levels affect DIAP1 activity.  The H99 deficiency also eliminates Rpr and Grim, which may affect DIAP1 without ubiquitylating it. In our experiments, however, there are no cells completely wild type for the H99 region for comparison in the same specimen, so our results do not rule out the H99 deletion having a dominant effect on DIAP1 levels both inside and outside the clones.  What our data clearly showed is that emc affected DIAP1 levels independently of any potential RHG effect, and we hypothesized this was through diap1 transcription, because we showed previously that emc affects yki, a transcriptional regulator of the diap1 gene, but we have not demonstrated transcriptional regulation of diap1 directly in emc clones.  We modified the manuscript to better delineate these issues (lines 275-284).    

      (2) Are Diap1 protein levels reduced in all emc clones, including clones anterior to the furrow? This is difficult to see in Figure 3B. it is also recommended to look in emc mosaic wing discs.

      We now mention that DIAP1 levels were only reduced in  emc clones posterior to the morphogenetic furrow, not anterior to the morphogenetic furrow or in emc clones in wing imaginal discs (lines 284-5) and Figure 3 supplement 1.  

      (3) The authors speculate that Delta may be a direct target of caspase cleavage (Figure 9B), but then rule it out for a good reason. However, I assume that the increased protein levels of Delta in emc clones (Figure 7) are the results of increased transcription. In that case, shouldn't caspases control the transcriptional machinery leading to Delta expression?

      Thank you for suggesting that caspases control the transcription of Dl.  We added this possibility to the manuscript (lines 499-500).  At one time there was a Dl-LacZ transcriptional reporter, which would have made it straightforward to assess Dl transcription in emc clones, but this strain does not seem to exist now.  We have not attempted in situ hybridization to Dl transcripts in mosaic discs.  

      (4) How does caspase activity in emc clones cause reduced growth? Is this also mediated through Delta signaling?

      We do not know what is the caspase target responsible for reduced growth in wing discs.

      (5) Figure 1M: Is there a similar result with emc dronc mosaics?

      The emc dronc clones do not show as dramatic a growth advantage in a Minute background.  This is consistent with the smaller effect of emc dronc in the non-Minute background also (Figure 1N).  We mention this in the revised paper (lines 232-3).     

      Reviewer #2 (Public Review):

      Id proteins are thought to function by binding and antagonizing basic helix-loop-helix (bHLH) transcription factors but new findings demonstrate roles for emc including in tissues where no proneural (Drosophila bHLH) genes are known to function. The authors propose a new mechanism for developmental regulation that entails restraining new/novel non-apoptotic functions of apoptotic caspases.

      Specifically, the data suggest that loss of emc leads to reduced expression of diap1 and increased apoptotic caspase activity, which does not induce apoptosis but elevates Delta expression to increase N activity and cause developmental defects. Indeed, many of the phenotypes of emc mutant clones can be rescued by a chromosomal deficiency that reduces caspase activation or by mutations in the initiator caspase Dronc. A related manuscript that shows that loss of emc results in increased da, linked previously to diap1 expression, provides supporting data. There is increasing appreciation that apoptotic caspases have non-apoptotic roles. This study adds to the emerging field and should be of interest to readers.

      The data, for the most part, support the conclusions but I do have concerns about some of the data and the interpretations that should be addressed.

      Reviewer #3 (Public Review):

      The work extends earlier studies on the Drosophila Id protein EMC to uncover a potential pathway that explains several tissue-scale developmental abnormalities in emc mutants. It also describes a non-apoptotic role for caspases in cell biology.

      Strengths:

      The work adds to an emerging new set of functions for caspases beyond their canonical roles as cell death mediators. This novelty is a major strength as well as its reliance on genetic-based in vivo study. The study will be of interest to those who are curious about caspases in general.

      Weaknesses:

      The manuscript relies on imaging experiments using genetic mosaic imaginal discs. It is for the most part a qualitative analysis, showing representative samples with a small number of mutant clones in each. Although the senior author has a long track record of using experiments like this to rigorously discover regulatory mechanisms in this system, it is straightforward in 2023 to use Fiji and other image analysis tools to measure fluorescence. Such measurements could be done for all replicate clones of a given genotype as well as genetic control sampling. These could be presented in plots that would not only provide quantitative and statistical measurements, but will be more reader- friendly to those who are not fly people.

      We added quantification of anti-Delta and anti-Diap1 levels to the manuscript (Figures 3E and 7E).  We agree that this facilitates statistical confirmation of the results and may be more accessible to non-experts.  We do have concerns that these quantifications might be given too much weight.  For example, we cannot measure the background level of anti-DIAP1 labeling by labeling diap1 null mutant cells, because such cells do not survive.  Although we measure ~20% reduction in emc clones in the eye disc, and none in the wing disc, both measures could be underestimates if some of the labeling is non-specific, as is very possible.  We discuss this in the Methods (lines 166-9).

      Likewise, more details are needed to describe how clone areas were measured in Figure 1. Did they measure each clone and its twin spot, and then calculate the area ratio for each clone and its paired twin spot? This would be the correct way to analyze the data, yielding many independent measurements of the ratio. And doing so would obviate the need to log transform the data which is inexplicable unless they were averaging clones and twins within a disc and making replicates. More explanation is needed and if they indeed averaged, then they need to calculate the ratios pairwise for each clone and twin.

      We added details of clone size measurements and analysis to the methods (lines 141-6).  Although it might be useful to compare individual clones and corresponding twin spots, the only rigorous way to associate individual clones with individual twin spots, or even to determine what is one clone and what is one twin spot, is to use recombination rates low enough that significantly less than one recombination occurs per disc.  This would require many more dissections and we did not do this.  We now clarify in the manuscript that the analysis is indeed based on the ratio of total area of clones and twin spots with replicates, and that Log-transformation is to improve the normality of the ratio data suitable for parametric significance testing, not because clones and twin spots were summed from each sample.  We consulted with a statistician over this approach.  

      Reviewer #1 (Recommendations For The Authors):

      Lines 319/320: "Frizzled-3 RFP expression was not changed in in emc clones (Figure 4A)". This was actually not shown in Fig 4A (in fact this result was not shown at all). Fig 4A shows the result for emc nkd3 which the authors incorrectly assigned to Figure 4B (line 324).

      We apologize for labeling Figure 4A and 4B incorrectly.

      The title of Figure 6 is inaccurate. The title does not indicate what is shown in this figure. A more accurate title would be: Notch activity and function in emc mutant clones.

      We provided a new title for Figure 6. 

      Reviewer #2 (Recommendations For The Authors):

      There is no information on how reproducible the data is. How many discs were examined in each experiment and in how many technical or biological replicates? Can fluorescence signals be quantified within and outside the clones and presented to illustrate reproducibility and significance? This is especially needed for Fig 7, which shows key data that N ligand Delta is elevated in emc clones but dronc and H99 mutations rescue this phenotype. I can see that the Dl signal is brighter in the GFP- emc clone in Fig 7B but I can also see a brighter Dl signal in the small clone and perhaps also in the large clone in C. The difference between B and C could be simply disc-to-disc variation, which should be addressed with quantification and presentation of all data points.

      We added the number of samples to each figure legend.  We quantified the fluorescence signals for Figures 3 and 7.  Quantification shows that the difference between 7B and 7C is highly significant, not disc to disc variation.

      Fig 2B does not support the conclusion. It is supposed to show premature Sens expression and therefore abnormal morphogenetic furrow progression in emc clones. But the yellow arrow is pointing to GFP+ (wild type) cells and it is within this GFP+ region that most premature Sens expression is seen.

      We relocated the arrows in Figure 2B to point precisely to the premature differentiation.  When the morphogenetic furrow is accelerated in emc mutant, GFP – tissue, it does not stop when wild type, GFP+ tissue is encountered again, it continues at a normal pace.  Accordingly, emc+ regions that are anterior to emc- regions can also experience accelerated differentiation (please see lines 594-8).

      Fig 1 shows that while H99 deficiency restores the growth of emc clones to wild type level (Fig 1N), placing these in the Minute background made emc clones grow better than emc wild type but Minute neighbors (Fig 1M). The latter cells were nearly absent, suggesting elimination through cell competition. For the rest of the figures, some experiments are done in the Minute background (e.g., emc H99 clones in Fig 2D) while others are not in the Minute background (e.g., emc H99 clones in Fig 7D). Why the switch between backgrounds from experiment to experiment?

      Figure 2D shows emc H99 clones in a Minute background so that it can be compared with panels 2A-C, which show clones of other genotypes in a Minute background.  These clones almost take over the eye disc.  In Figure 7D, it was important to show the Dl expression pattern in a substantial wild type region, which could only be shown using the non-Minute background.  We have no indication that a Minute background changes the properties of the nonMinute clone, other than allowing its greater growth.  

      The first 3 paragraphs of the Introduction are overly detailed and read more like a review article. These could be made more concise to focus on the founding data for this manuscript, which are the published findings that emc mutations elevate ex expression (line 129) and that ex mutants show elevated diap1 expression (line 125). These do not show up until the very end of the Introduction.

      We shortened the Introduction to focus more rapidly on the topics relevant to these experiments.

      In several places, the space between the end of the sentence and the citation is missing (e.g., lines 57, 68, and 75).

      The spacing of citations was fixed.

      Line 247. 'morphogenetic furrow that found each ommatidia...' should use a word besides 'found.'

      We corrected line 247.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors show that inhibiting caspases rescues the growth defect of emc clones. However, they did not find excessive TUNEL staining in emc clones that would explain why the clones would be so small - excessive cell death. How reliable was their tunel staining in being able to detect excessive apoptosis (only negative data was shown). Could they induce excessive cell death using radiation or some other means to ensure the assay is robust? If death is not occurring in emc clones, a deficiency worth addressing is that they do not discuss or explore how the caspases then inhibit clone growth. Is it expanded cell cycle times, or smaller cells?? And that phenotype does not fit with their end model of Delta being the only moderator of emc since it is not playing a significant role in tissue growth anterior to the furrow.One would assume using the commercial antibody against activated caspase would be another readout for emc clones and this would bolster their claim that excessive caspase activation occurs in the emc cells.

      We have added Dcp1 staining in Figure 2 supplement 3 to show that TUNEL staining is reliable.

      (2) Figure 3D has really large emc clones when GMR-Diap is present. But the large clones are anterior to the furrow where Diap would not be overexpressed. Is this just an unusual sample with a coincidentally big emc M+ clone? It speaks to my concerns about the qualitative nature of the data.

      We replaced Figure 3D with an example of smaller clones.  Nowhere have we suggested that  GMR-DIAP1 affects clone size.

      (3) Figure 9B is very speculative and not appropriate since the authors have zero data to support that cleavage mechanism. It is fit for the next paper if the idea is correct. The panel should be removed.

      We did not intend Figure 9B to imply that we think Dl itself is the relevant target of non-apoptotic caspases.  Since apparently we gave that impression, we removed this to a supplemental figure.  We still think it is worth showing that Dl does not contain predicted caspase sites expected to activate signaling. 

      (4) Figure 9A could be made more clear. Their pathway represents the mutant cells in the mosaic disc. Why not also outline what you think is happening in the emc+ cells as well?

      It is difficult to make a comparable diagram for normal cells, because none of this pathway happens in normal cells.  We modified the figure legend to indicate this (lines 677-8).

      (5) The one emc ci clone they show spanning the furrow has a very non-continuous furrow advance phenotype. This is unlike the emc clones where the furrow advance is graded about the clone. And it resembles the SuH clones they show. This result and the synergistic effect on clone sizes they mention need more discussion and thought put into it. It argues ci is doing something with respect to emc action. loss of ci might not rescue size and furrow advance but actually, it makes it worse! This is interesting and might suggest an inhibitory role for ci in emc or a parallel role for ci in mediating growth and progression that is redundant with emc.

      We agree that aspects of the emc ci phenotype are not clear.  We discuss this in the revised manuscript (lines 373-5).  

      (6) Related to point 7, it is a weak argument for non-autonomy that graded furrow advance in emc clones is evidence for emc acting nonautonomously through Delta. Its weakness is combined with its lack of significance relative to the other findings. It should be deleted as should the SuH data.

      We agree that the evidence that emc affects morphogenetic furrow progression non-autonomously is not compelling and have revised the manuscript to soften this conclusion (lines 426-7).  We do not want to remove this idea, because it does in fact have significance for other findings.  Specifically, it supports the idea that the emc effect in the morphogenetic furrow is due to trans-activation by Delta, whereas  the effect on R7 and cone cell differentiation is due to autonomous cis-inhibition.  We think this is important to keep in the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. I do have some concerns with the way that the project has been conceptualized, which I share below.

      Thank you for acknowledging the strengths and novelty of our study. We have now addressed the conceptual issues raised; please see below in the specific comments.

      (2) The authors should provide careful working definitions of what exactly they think is occurring in the brain following sensory deprivation. Characterizing these changes as 'largescale neural reorganization' and 'compensatory adaptation' gives the impression that the authors believe that there is good evidence in support of significant structural changes in the pathways between brain areas - a viewpoint that is not broadly supported (see Makin and Krakauer, 2023). The authors report changes in connectivity that amount to differences in coordinated patterns of BOLD signal across voxels in the brain; accordingly, their data could just as easily (and more parsimoniously) be explained by the unmasking of connections to the auditory cortex that are present in typically hearing individuals, but which are more obvious via MR in the absence of auditory inputs.

      We thank the Reviewer for the suggestion to clarify and better support our stance regarding reorganization. We indeed believe that the adaptive changes in the auditory cortex in deafness represent real functional recruitment for non-auditory functions, even in the relatively limited large-scale anatomical connectivity changes. This is supported by animal works showing causal evidence for the involvement of deprived auditory cortices in non-auditory tasks, in a way that is not found in hearing controls (e.g., Lomber et al., 2010, Meredith et al., 2011, reviewed in Alencar et al., 2019; Lomber et al., 2020). Whether the word “reorganization” should be used is indeed debated recently (Makin and Krakauer, 2023). Beyond terminology, we do agree that the basis for the changes in recruitment seen in the brains of people with deafness or blindness is largely based on the typical anatomical connectivity at birth. We also agree that at the group level, there is poor evidence of large-scale anatomical connectivity differences in deprivation. However, we think there is more than ample evidence that the unmasking and more importantly re-weighting of non-dominant inputs gives rise to functional changes. This is supported by the relatively weaker reorganization found in late-onset deprivation as compared to early-onset deprivation. If unmasking of existing connectivity without any functional additional changes were sufficient to elicit the functional responses to atypical stimuli (e.g., non-visual in blindness and non-auditory in deafness), one would expect there to be no difference between early- and late-onset deprivation in response patterns. Therefore, we believe that the fact that these are based on functions with some innate pre-existing inputs and integration is the mechanism of reorganization, not a reason not to treat it as reorganization. Specifically, in the case of this manuscript, we report the change in variability of FC from the auditory cortex, which is greater in deafness than in typically hearing controls. This is not an increase in response per se, but rather more divergent values of FC from the auditory cortex, which are harder to explain in terms of ‘unmasking’ alone, unless one assumes unmasking is particularly variable. The mechanistic explanation for our findings is that in the absence of auditory input’s fine-tuning and pruning of the connectivity of the auditory cortex, more divergent connectivity strength remains among the deaf. Thus, auditory input not only masks non-dominant inputs but also prunes/deactivates exuberant connectivity, in a way that generates a more consistently connected auditory system. We have added a shortened version of these clarifications to the discussion (lines 351-372).

      (3) I found the argument that the deaf use a single modality to compensate for hearing loss, and that this might predict a more confined pattern of differential connectivity than had been previously observed in the blind to be poorly grounded. The authors themselves suggest throughout that hearing loss, per se, is likely to be driving the differences observed between deaf and typically-hearing individuals; accordingly, the suggestion that the modality in which intentional behavioral compensation takes place would have such a large-scale effect on observed patterns of connectivity seems out of line.

      Thank you for your critical insight regarding our rationale on modality use and its impact on connectivity patterns in the deaf compared to the blind. After some thought, we agree that the argument presented may not be sufficiently strong and could distract from the main findings of our study. Therefore, we have decided to remove this claim from our revised manuscript.

      (4) The analyses highlighting the areas observed to be differentially connected to the auditory cortex and areas observed to be more variable in their connectivity to the auditory cortex seem somewhat circular. If the authors propose hearing loss as a mechanism that drives this variability in connectivity, then it is reasonable to propose hypotheses about the directionality of these changes. One would anticipate this directionality to be common across participants and thus, these areas would emerge as the ones that are differently connected when compared to typically hearing folks.

      We are a little uncertain how to interpret this concern.  If the question was about the logic leading to our statement that variability is driven by hearing loss, then yes, we indeed were proposing hearing loss as a mechanism that drives this variability in connectivity to the auditory cortex; we regret this was unclear in the original manuscript. This logic parallels the proposal made with regard to the increased variability in FC in blindness; deprivation leads to more variable outcomes, due to the lack of developmental environmental constraints (Sen et al., 2022). Specifically, we first analyzed the differences in within-group variability between deaf and hearing individuals (Fig. 1A), followed by examining the variability ratio (Fig. 1B) in the same regions that demonstrated differences. The first analysis does not specify which group shows higher variability; therefore, the second analysis is essential to clarify the direction of the effect and identify which group, and in which regions, exhibits greater variability. We have clarified this in the revised manuscript (lines 125-127): “To determine which group has larger individual differences in these regions (Figure 1B), we computed the ratio of variability between the two groups (deaf/hearing) in the areas that showed a significant difference in variability (Figure 1A)”. Nevertheless, this comment can also be interpreted as predicting that any change in FC due to deafness would lead to greater variability. In this case, it is also important to mention that while we would expect regions with higher variability to also show group differences between the deaf and the hearing (Figure 2), our analysis demonstrates that variability is present even in regions without significant group mean differences. Similarly, many areas that show a difference between the groups in their FC do not show a change in variability (for example, the bilateral anterior insula and sensorimotor cortex). In fact, the correlation between the regions with higher FC variability (Figure 1A) and those showing FC group differences (Figure 2B) is significant but rather modest, as we now acknowledge in our revised manuscript (lines 324-328). Therefore, increased FC and increased variability of FC are not necessarily linked. 

      (5) While the authors describe collecting data on the etiology of hearing loss, hearing thresholds, device use, and rehabilitative strategies, these data do not appear in the manuscript, nor do they appear to have been included in models during data analysis. Since many of these factors might reasonably explain differences in connectivity to the auditory cortex, this seems like an omission.

      We thank the Reviewer for their comment regarding the inclusion of these variables in our manuscript. We have now included additional information in the main text and a supplementary table in the revised manuscript that elaborates further on the etiology of hearing loss and all individual information that characterizes our deaf sample. Although we initially intended to include individual factors (e.g., hearing threshold, duration of hearing aid use, and age of first use) in our models, this was not feasible for the following reasons: 1) for some subjects, we only have a level  of hearing loss rather than specific values, which we could not use quantitatively as a nuisance variable (it was typical in such testing to ascertain the threshold of loss as belonging to a deafness level, such as “profound” and not necessarily go into more elaborate testing to identify the specific threshold), and 2) this information was either not collected for the hearing participants (e.g., hearing threshold) or does not apply to them (e.g., age of hearing aid use), which made it impossible to use the complete model with all these variables. Modeling the groups separately with different variables would also be inappropriate. Last, the distribution of the values and the need for a large sample to rigorously assess a difference in variability also precluded sub-dividing the group to subgroup based on these values. 

      Therefore, we opted for a different way to control for the potential influence of these variables on FC variability in the deaf. We tested the correlation between the FC from the auditory cortex and each of these parameters in the areas that showed increased FC in deafness (Figures 1A, B), to see if it could account for the increased variability. This ROI analysis did not reveal any significant correlations (all p > .05, prior to correction for multiple comparisons; see Figures S4, S5, and S6 for scatter plots). The maximal variability explained in these ROIs by the hearing factors was r2\=0.096, whereas the FC variability (Figure 1B) was increased by at least 2 in the deaf. Therefore, it does not seem like these parameters underlie the increased variability in deafness. To test if these variables had a direct effect on FC variability in other areas in the brain, we also directly computed the correlation between FC and each factor individually. At the whole-brain level, the results indicate a significant correlation between AC-FC and hearing threshold, as well as a correlation between AC-FC and the age of hearing aid use onset, but not for the duration of hearing aid use (Figure S3). While these may be interesting on their own, and are added to the revised manuscript, the regions that show significant correlations with hearing threshold and age of hearing aid use are not the same regions that exhibit FC variability in the deaf (Figures 1A, B).

      Overall, these findings suggest that although some of these factors may influence FC, they do not appear to be the driving factors behind FC variability. Finally, in terms of rehabilitative strategies, only one deaf subject reported having received long-term oral training from teachers. This participant started this training at age 2, as now described in the participants’ section. We thank the reviewer for raising this concern and allowing us to show that our findings do not stem from simple differences ascribed to auditory experience in our participants. 

      Reviewer #2 (Public Review):

      (1) The paper has two main merits. Firstly, it documents a new and important characteristic of the re-organization of the brains of the deaf, namely its variability. The search for a welldefined set of functions for the deprived auditory cortex of the deaf has been largely unsuccessful, with several task-based approaches failing to deliver unanimous results. Now, one can understand why this was the case: most likely there isn't a fixed one well-defined set of functions supported by an identical set of areas in every subject, but rather a variety of functions supported by various regions. In addition, the paper extends the authors' previous findings from blind subjects to the deaf population. It demonstrates that the heightened variability of connectivity in the deprived brain is not exclusive to blindness, but rather a general principle that applies to other forms of deprivation. On a more general level, this paper shows how sensory input is a driver of the brain's reproducible organization.

      We thank the Reviewer for their observations regarding the merits of our study. We appreciate the recognition of the novelty in documenting the variability of brain reorganization in deaf individuals. 

      (2) The method and the statistics are sound, the figures are clear, and the paper is well-written. The sample size is impressively large for this kind of study.

      We thank the Reviewer for their positive feedback on the methodology, statistical analysis, clarity of figures, and the overall composition of our paper. We are also grateful for the acknowledgment of our large sample size, which we believe significantly strengthens the statistical power and the generalizability of our findings.

      (3) The main weakness of the paper is not a weakness, but rather a suggestion on how to provide a stronger basis for the authors' claims and conclusions. I believe this paper could be strengthened by including in the analysis at least one of the already published deaf/hearing resting-state fMRI datasets (e.g. Andin and Holmer, Bonna et al., Ding et al.) to see if the effects hold across different deaf populations. The addition of a second dataset could strengthen the evidence and convincingly resolve the issue of whether delayed sign language acquisition causes an increase in individual differences in functional connectivity to/from Broca's area. Currently, the authors may not have enough statistical power to support their findings.

      We thank the Reviewer for their constructive suggestion to reinforce the robustness of our findings. While we acknowledge the potential value of incorporating additional datasets to strengthen our conclusions, the datasets mentioned (Andin and Holmer, Bonna et al., Ding et al.) are not publicly available, which limits our ability to include them in our analysis. Additionally, datasets that contain comparable groups of delayed and native deaf signers are exceptionally rare, further complicating the possibility of their inclusion. Furthermore, to discern individual differences within these groups effectively, a substantially larger sample size is necessary. As such, we were unfortunately unable to perform this additional analysis. This is a challenge we acknowledge in the revised manuscript (lines 442-445), especially when the group is divided into subcategories based on the level of language acquisition, which indeed reduces our statistical power. We have however, now integrated the individual task accuracy and reaction time parameters as nuisance variables in calculating the variability analyses; all the results are fully replicated when accounting for task difficulty. We also report that there was no group difference in activation for this task between the groups which could affect our findings. 

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. That said, we are exploring collaborations and other avenues to access comparable datasets that might enable a more powerful analysis in future work. This feedback is very important for guiding our ongoing efforts to verify and extend our conclusions.

      (4) Secondly, the authors could more explicitly discuss the broad implications of what their results mean for our understanding of how the architecture of the brain is determined by the genetic blueprint vs. how it is determined by learning (page 9). There is currently a wave of strong evidence favoring a more "nativist" view of brain architecture, for example, face- and object-sensitive regions seem to be in place practically from birth (see e.g. Kosakowski et al., Current Biology, 2022). The current results show what is the role played by experience.

      We thank the Reviewer for highlighting the need to elaborate on the broader implications of our findings in relation to the ongoing debate of nature vs. nurture. We agree that this discussion is crucial and have expanded our manuscript to address this point more explicitly. We now incorporate a more detailed discussion of how our results contribute to understanding the significant role of experience in shaping individual neural connectivity patterns, particularly in sensory-deprived populations (lines 360-372).

      Reviewer #3 (Public Review):

      Summary:

      (1) This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      -  The manuscript is well written.

      -  The methods are clearly described and appropriate.

      -  Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes.

      -  The results are interesting and novel.

      We thank the Reviewer for their positive and detailed feedback. Their acknowledgment of the clarity of our methods and the novelty of our results is greatly appreciated.

      Weaknesses:

      (2) Analyses were conducted for task-based data rather than resting-state data. It was unclear whether groups differed in task performance. If congenitally deaf individuals found the task more difficult this could lead to changes in FC.

      We thank the Reviewer for their observation regarding possible task performance differences between deaf and hearing participants and their potential effect on the results. Indeed, there was a difference in task accuracy between these groups. To account for this variation and ensure that our findings on functional connectivity were not confounded by task performance, we now included individual task accuracy and reaction time as nuisance variables in our analyses. This approach allowed us to control for any performance differences. The results now presented in the revised manuscript account for the inclusion of these two nuisance variables (accuracy and reaction time) and completely align with our original conclusions, highlighting increased variability in deafness, which is found in both the entire deaf group at large, as well as when equating language experience and comparing the hearing and native signers. The correlation between variability and group differences also remains significant, but its significance is slightly decreased, a moderate effect we acknowledge in the revised manuscript (see comment #4). The differences between the delayed signers and native signers are also retained (Figure 3), now aligning better with language-sensitive regions, as previously predicted. The inclusion of the task difficulty predictors also introduced an additional finding in this analysis, a significant cluster in the right aIFG. Therefore, the inclusion of these predictors reaffirms the robustness of the conclusions drawn about FC variability in the deaf population.

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state if we had access to such data, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. We have also addressed this point in our manuscript (lines 442-451).

      (3) No differences in overall activation between groups were reported. Activation differences between groups could lead to differences in FC. For example, lower activation may be associated with more noise in the data, which could translate to reduced FC.

      We thank the reviewer for noting the potential implications of overall activation differences on FC. In our analysis of the activation for words, we found no significant clusters showing a group difference between the deaf and hearing participants (p < .05, cluster-corrected for multiple comparisons) - we also added this information to the revised manuscript (lines 542-544). This suggests that the differences in FC observed are not confounded by variations in overall brain activation between the groups under these conditions.

      (4) Figure 2B shows higher FC for congenitally deaf individuals than normal-hearing individuals in the insula, supplementary motor area, and cingulate. These regions are all associated with task effort. If congenitally deaf individuals found the task harder (lower performance), then activation in these regions could be higher, in turn, leading to FC. A study using resting-state data could possibly have provided a clearer picture.

      We thank the Reviewer for pointing out the potential impact of task difficulty on FC differences observed in our study. As addressed in our response to comment #2, task accuracy and reaction times were incorporated as nuisance variables in our analysis. Further, these areas showed no difference in activation between the groups (see response to comment #3 above). Notably, the referred regions still showed higher FC in congenitally deaf individuals even when controlling for these performance differences. Additionally, these findings are consistent with results from studies using resting-state data in deaf populations, further validating our observations. Specifically, using resting-state data, Andin & Holmer (2022), have shown higher FC for deaf (compared to hearing individuals) from auditory regions to the cingulate cortex, insular cortex, cuneus and precuneus, supramarginal gyrus, supplementary motor area, and cerebellum. Moreover, Ding et al. (2016) have shown higher FC for the deaf between the STG and anterior insula and dorsal anterior cingulated cortex. This suggests that the observed FC differences are likely reflective of genuine neuroplastic adaptations rather than mere artifacts of task difficulty. Although we wish we could augment our study with resting-state data analyzed similarly, we could not at present acquire or access such a dataset. We acknowledge this limitation of our study (lines 442-451) in the revised manuscript and intend to confirm that similar results will be found with resting state data in the future.

      (5) The correlation between the FC map and the FC variability map is 0.3. While significant using permutation testing, the correlation is low, and it is not clear how great the overlap is.

      We acknowledge that the correlation coefficient of 0.3, while statistically significant, indicates a moderate overlap. It's also worth noting that, using our new models that include task performance as a nuisance variable, this value has decreased somewhat, to 0.24 (which is still highly significant). It is important to note that the visual overlap between the maps is not a good estimate of the correlation, which was performed on the unthresholded maps, to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This correlation is meant to suggest a trend rather than a strong link, but especially due to its consistency with the findings in blindness, we believe this observation merits further investigation and discussion. As such, we kept it in the revised manuscript while moderating our claims about its strength.

      Reviewer #1 (Recommendations For The Authors):

      (1) Page 4: Does auditory cortex FC variability..." FC is not yet defined.

      Corrected, thanks.

      (2) Page 4: "It showed lower variability..." What showed this?

      Clarified, thanks.

      (3) Page 11: "highlining the importance" should read "highlighting the importance".

      Corrected, thanks.

      (4) Page 11: Do you really mean to suggest functional connectivity does not vary as a function of task? This would not seem well supported.

      We do not suggest that FC doesn’t vary as a function of task, and have revised this section (lines 447-451). 

      (5) Page 12: "there should not to be" should read "there should not be".

      Corrected, thanks.

      (6) Page 12: "and their majority" should read "and the majority".

      Corrected, thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major

      (1) Although this is a lot of work, I nonetheless have another suggestion on how to test if your results are strong and robust. Perhaps you could analyze your data using an ROI/graph-theory approach. I am not an expert in graph theory analysis, but for sure there is a simple and elegant statistic that captures the variability of edge strength variability within a population. This approach could not only validate your results with an independent analysis and give the audience more confidence in their robustness, but it could also provide an estimate of the size of the effect size you found. That is, it could express in hard numbers how much more variable the connections from auditory cortex ROI's are, in comparison to the rest of the brain in the deaf population, relative to the hearing population.

      We thank the Reviewer for suggesting the use of graph theory as a method to further validate our findings. While we see the potential value in this approach, we believe it may be beyond the scope of the current paper, and merits a full exploration of its own, which we hope to do in the future.  However, we understand the importance of showing the uniqueness of the connectivity of the auditory cortex ROI as compared to the rest of the brain. So, in order to bolster our results, we conducted an additional analysis using control regions of interest (ROIs). Specifically, we calculated the inter-individual variability using all ROIs from the CONN Atlas (except auditory and language regions) as the control seed regions for the FC. We showed that the variability of connectivity from the auditory cortex is uniquely more increased on deafness, as compared to these control ROIs (Figure S1). This additional analysis supports the specificity of our findings to the auditory cortex in the deaf population. We aim to integrate more analytic approaches, including graph theory methods, in our future work.

      Minor

      (1) Some citations display the initial of the author in addition to the last name, unless there is something I don't know about the citation system, the initial shouldn't be there.

      This is due to the citation style we're using (APA 7th edition, as suggested by eLife), which requires including the first author's initials in all in-text citations when citing multiple authors with the same last name.  

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors provide behavioral data and results for overall neural activation.

      Thanks. We have added these to the revised manuscript. Specifically, we report that there was no difference in the activation for words (p < .05, cluster-corrected for multiple comparisons) between the deaf and hearing participants. Further, we report the behavioral averages for accuracy and reaction time for each group, and have now used these individual values explicitly as nuisance variables in the revised analyses.

      (2) For the correlation between FC and FC variability, it seemed a bit odd that the permuted data were treated additionally (through Gaussian smoothing). I understand the general logic (i.e., to reintroduce smoothness), but this approach provides more smoothing to the permutation than the original data. It is hard to know what this does to the statistical distribution. I recommend using a different approach or at least also reporting the p-value for non-smoothed permutation data.

      In response to this suggestion and to ensure transparency in our results, we have now included also the p-value for the non-smoothed permutation data in our revised manuscript (still highly significant; p < .0001). Thanks for this proposal.

      (3) For the map comparison, a plot with different colors, showing the FC map, the FC variability map, and one map for the overlap on the same brain may be helpful.

      We thank the Reviewer for their suggestion to visualize the overlap between the maps. However, we performed the correlation analysis using the unthresholded maps, as mentioned in the methods section of our manuscript, specifically to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This is why the maps displayed in the figures, which are thresholded for significance, may not appear to match perfectly, and may actually obscure the correlation across the brain. This methodological detail is crucial for interpreting the relationship and overlap between these maps accurately but also explains why the visualization of the overlap is, unfortunately, not very informative.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary

      The authors asked if parabrachial CGRP neurons were only necessary for a threat alarm to promote freezing or were necessary for a threat alarm to promote a wider range of defensive behaviors, most prominently flight.

      Major Strengths of Methods and Results

      The authors performed careful single-unit recording and applied rigorous methodologies to optogenetically tag CGRP neurons within the PBN. Careful analyses show that single-units and the wider CGRP neuron population increases firing to a range of unconditioned stimuli. The optogenetic stimulation of experiment 2 was comparatively simpler but achieved its aim of determining the consequence of activating CGRP neurons in the absence of other stimuli. Experiment 3 used a very clever behavioral approach to reveal a setting in which both cue-evoked freezing and flight could be observed. This was done by having the unconditioned stimulus be a "robot" traveling along a circular path at a given speed. Subsequent cue presentation elicited mild flight in controls and optogenetic activation of CGRP neurons significantly boosted this flight response. This demonstrated for the first time that CGRP neuron activation does more than promote freezing. The authors conclude by demonstrating that bidirectional modulation of CGRP neuron activity bidirectionally aTects freezing in a traditional fear conditioning setting and aTects both freezing and flight in a setting in which the robot served as the unconditioned stimulus. Altogether, this is a very strong set of experiments that greatly expand the role of parabrachial CGRP neurons in threat alarm.

      We would like to sincerely thank the reviewer for the positive and insightful comments on our work. We greatly appreciate the acknowledgment of our new behavioral approach, which allowed us to observe a dynamic spectrum of defensive behaviors in animals. Our use of the robot-based paradigm, which enables the observation of both freezing and flight, has been instrumental in expanding our understanding of how parabrachial CGRP neurons modulate diverse threat responses. We are pleased that the reviewer found this methodological innovation to be a valuable contribution to the field.

      Weaknesses

      In all of their conditioning studies the authors did not include a control cue. For example, a sound presented the same number of times but unrelated to US (shock or robot) presentation. This does not detract from their behavioral findings. However, it means the authors do not know if the observed behavior is a consequence of pairing. Or is a behavior that would be observed to any cue played in the setting? This is particularly important for the experiments using the robot US.

      We appreciate the reviewer’s insightful comment regarding the absence of a control cue in our conditioning studies. First, we would like to mention that, in response to the Reviewer 3, we have updated how we present our flight data by following methods from previously published papers (Fadok et al., 2017; Borkar et al., 2024). Instead of counting flight responses, we calculated flight scores as the ratio of the velocity during the CS to the average velocity in the 7 s before the CS on the conditioning day (or 10 s for the retention test). This method better captures both the speed and duration of fleeing during CS. With this updated approach, we observed a significant difference in flight scores between the ChR2 and control groups, even during conditioning, which may partly address the reviewer’s concern about whether the observed behavior is a consequence of CS-US pairing.

      However, we agree with the reviewer that including an unpaired group would provide stronger evidence, and in response, we conducted an additional experiment with an unpaired group. In this unpaired group, the CS was presented the same number of times, but the robot US was delivered randomly within the inter-trial interval. The unpaired group did not exhibit any notable conditioned freezing or flight responses. We believe that this additional experiment, now reflected in Figure 3, further strengthens our conclusion that the fleeing behavior is driven by associative learning between the CS and US, rather than a reaction to the cue itself.

      The authors make claims about the contribution of CGRP neurons to freezing and fleeing behavior, however, all of the optogenetic manipulations are centered on the US presentation period. Presently, the experiments show a role for these neurons in processing aversive outcomes but show little role for these neurons in cue responding or behavior organizing. Claims of contributions to behavior should be substantiated by manipulations targeting the cue period.

      We appreciate the reviewer’s constructive comments. We would like to emphasize that our primary objective in this study was to investigate whether activating parabrachial CGRP neurons—thereby increasing the general alarm signal—would elicit different defensive behaviors beyond passive freezing. To this end, we focused on manipulating CGRP neurons during the US period rather than the cue period.

      Previous studies have shown that CGRP neurons relay US signals, and direct activation of CGRP neurons has been used as the US to successfully induce conditioned freezing responses to the CS during retention tests (Han et al., 2015; Bowen et al., 2020). In our experiments, we also observed that CGRP neurons responded exclusively to the US during conditioning with the robot (Figure 1F), and stimulating these neurons in the absence of any external stimuli elicited strong freezing responses (Figure 2B). These findings, collectively, suggest that activation of CGRP neurons during the CS period would predominantly result in freezing behavior.

      Therefore, we manipulated the activity of CGRP neurons during the US period to examine whether adjusting the perceived threat level through these neurons would result in diverse dfensive behaivors when paired with chasing robot. We observed that enhancing CGRP neuron activity while animals were chased by the robot at 70 cm/s made them react as if chased at a higher speed (90 cm/s), leading to increased fleeing behaviors. While this may not fully address the role of these neurons in cue responding or behavior organizing, we found that silencing CGRP neurons with tetanus toxin (TetTox) abolished fleeing behavior even when animals were chased at high speeds (90 cm/s), which usually elicits fleeing without CGRP manipulation (Figure 5). This supports the conclusion that CGRP neurons are necessary for processing fleeing responses.

      In summary, manipulating CGRP neurons during the US period was essential for effectively investigating their role in adjusting defensive responses, thereby expanding our understanding of their function within the general alarm system. We hope this clarifies our experimental design and addresses the concern the reviewer has raised.

      Appraisal

      The authors achieved their aims and have revealed a much greater role for parabrachial CGRP neurons in threat alarm.

      Discussion

      Understanding neural circuits for threat requires us (as a field) to examine diverse threat settings and behavioral outcomes. A commendable and rigorous aspect of this manuscript was the authors decision to use a new behavioral paradigm and measure multiple behavioral outcomes. Indeed, this manuscript would not have been nearly as impactful had they not done that. This novel behavior was combined with excellent recording and optogenetic manipulations - a standard the field should aspire to. Studies like this are the only way that we as a field will map complete neural circuits for threat.

      We sincerely thank the reviewer for their positive and encouraging comments. We are grateful for the acknowledgment of our efforts in employing a novel behavioral paradigm to study diverse defensive behaviors. We are pleased that our work contributes to advancing the understanding of neural circuits involved in threat responses.

      Reviewer #3 (Public Review):

      Strengths:

      The study used optogenetics together with in vivo electrophysiology to monitor CGRP neuron activity in response to various aversive stimuli including robot chasing to determine whether they encode noxious stimuli diTerentially. The study used an interesting conditioning paradigm to investigate the role of CGRP neurons in the PBN in both freezing and flight behaviors.

      Weakness:

      The major weakness of this study is that the chasing robot threat conditioning model elicits weak unconditioned and conditioned flight responses, making it diTicult to interpret the robustness of the findings. Furthermore, the conclusion that the CGRP neurons are capable of inducing flight is not substantiated by the data. No manipulations are made to influence the flight behavior of the mouse. Instead, the manipulations are designed to alter the intensity of the unconditioned stimulus.

      We sincerely thank the reviewer for the thoughtful and constructive comments on our manuscript. In response to this feedback, we revisited our analysis of the flight responses and compared our methods with those used in previous literatures examining similar behaviors.

      We reviewed a study investigating sex differences in defensive behavior using rats (Gruene et al., 2015). In that study, the CS was presented for 30 s, and active defensive behvaior – referred to as ‘darting’ – was quantified as ‘Dart rate (dart/min)’. This was calculated by doubling the number of darts counted during the 30-s CS presentation to extrapolate to a per-min rate. The highest average dart rate observed was approximatley 1.5. Another relevant studies using mice quantified active defensive behavior by calculating a flight score—the ratio of the average speed during each CS to the average speed during the 10 s pre-CS period (Fadok et al., 2017; Borkar et al., 2024). This method captures multiple aspects of flight behavior during CS presentation, including overall velocity, number of bouts, and duration of fleeing. Moreover, it accounts for each animal’s individual velocity prior to the CS, reflecting how fast the animals were fleeing relative to their baseline activity.

      In our original analysis, we quantified flight responses by counting rapid fleeing movements, defined as movements exceeding 8 cm/s. This approach was consistent with our previous study using the same robot paradigm to observe unique patterns of defensive behavior related to sex differences (Pyeon et al., 2023). Based on our earlier findings, where this approach effectively identified significant differences in defensive behaviors, we believed that this method was appropriate for capturing conditioned flight behavior within our specific experimental context. However, prompted by the reviewer's insightful comments, we recognized that our initial method might not fully capture the robustness of the flight responses. Therefore, we re-analyzed our data using the flight score method described by Fadok and colleagues, which provides a more sensitive measure of fleeing during the CS.

      Re-analyzing our data revealed a more robust flight response than previously reported, demonstrating that additional CGRP neuron stimulation promoted flight behavior in animals during conditioning, addressing the concern that the data did not substantiate the role of CGRP neurons in inducing flight. In addition, we would like to emphasize the findings from our final experiment, where silencing CGRP neurons, even under high-threat conditions (90 cm/s), prevented animals from exhibiting flight responses. This demonstrates that CGRP neurons are necessary in influencing flight responses.

      We have updated all flight data in the manuscript and revised the relevant figures and text accordingly. We appreciate the opportunity to enhance our analysis. The reviewer's insightful observation led us to adopt a better method for quantifying flight behavior, which substantiates our conclusion about the role of CGRP neurons in modulating defensive responses.

      Borkar, C.D., Stelly, C.E., Fu, X., Dorofeikova, M., Le, Q.-S.E., Vutukuri, R., et al. (2024). Top- down control of flight by a non-canonical cortico-amygdala pathway. Nature 625(7996), 743-749.

      Bowen, A.J., Chen, J.Y., Huang, Y.W., Baertsch, N.A., Park, S., and Palmiter, R.D. (2020). Dissociable control of unconditioned responses and associative fear learning by parabrachial CGRP neurons. Elife 9, e59799.

      Fadok, J.P., Krabbe, S., Markovic, M., Courtin, J., Xu, C., Massi, L., et al. (2017). A competitive inhibitory circuit for selection of active and passive fear responses. Nature 542(7639), 96-100.

      Gruene, T.M., Flick, K., Stefano, A., Shea, S.D., and Shansky, R.M. (2015). Sexually divergent expression of active and passive conditioned fear responses in rats. Elife 4, e11352.

      Han, S., Soleiman, M.T., Soden, M.E., Zweifel, L.S., and Palmiter, R.D. (2015). Elucidating an a_ective pain circuit that creates a threat memory. Cell 162(2), 363-374.

      Pyeon, G.H., Lee, J., Jo, Y.S., and Choi, J.-S. (2023). Conditioned flight response in female rats to naturalistic threat is estrous-cycle dependent. Scientific Reports 13(1), 20988.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript from So et al. describes what is suggested to be an improved protocol for single-nuclei RNA sequencing (snRNA-seq) of adipose tissue. The authors provide evidence that modifications to the existing protocols result in better RNA quality and nuclei integrity than previously observed, with ultimately greater coverage of the transcriptome upon sequencing. Using the modified protocol, the authors compare the cellular landscape of murine inguinal and perigonadal white adipose tissue (WAT) depots harvested from animals fed a standard chow diet (lean mice) or those fed a high-fat diet (mice with obesity). 

      Strengths: 

      Overall, the manuscript is well-written, and the data are clearly presented. The strengths of the manuscript rest in the description of an improved protocol for snRNA-seq analysis. This should be valuable for the growing number of investigators in the field of adipose tissue biology that are utilizing snRNA-seq technology, as well as those other fields attempting similar experiments with tissues possessing high levels of RNAse activity. 

      Moreover, the study makes some notable observations that provide the foundation for future investigation. One observation is the correlation between nuclei size and cell size, allowing for the transcriptomes of relatively hypertrophic adipocytes in perigonadal WAT to be examined. Another notable observation is the identification of an adipocyte subcluster (Ad6) that appears "stressed" or dysfunctional and likely localizes to crown-like inflammatory structures where proinflammatory immune cells reside. 

      Weaknesses:  

      Analogous studies have been reported in the literature, including a notable study from Savari et al. (Cell Metabolism). This somewhat diminishes the novelty of some of the biological findings presented here. Moreover, a direct comparison of the transcriptomic data derived from the new vs. existing protocols (i.e. fully executed side by side) was not presented. As such, the true benefit of the protocol modifications cannot be fully understood. 

      We agree with the reviewer’s comment on the limitations of our study. Following the reviewer's suggestion, we performed a new analysis by integrating our data with those from the study by Emont et al. Please refer to the Recommendation for authors section below for further details.

      Reviewer #2 (Public Review):

      Summary: 

      In the present manuscript So et al utilize single-nucleus RNA sequencing to characterize cell populations in lean and obese adipose tissues. 

      Strengths: 

      The authors utilize a modified nuclear isolation protocol incorporating VRC that results in higherquality sequencing reads compared with previous studies. 

      Weaknesses:  

      The use of VRC to enhance snRNA-seq has been previously published in other tissues. The snRNA-seq snRNA-seq data sets presented in this manuscript, when compared with numerous previously published single-cell analyses of adipose tissue, do not represent a significant scientific advance. 

      Figure 1-3: The snRNA-seq data obtained by the authors using their enhanced protocol does not represent a significant improvement in cell profiling for the majority of the highlighted cell types including APCs, macrophages, and lymphocytes. These cell populations have been extensively characterized by cytoplasmic scRNA-seq which can achieve sufficient sequencing depth, and thus this study does not contribute meaningful additional insight into these cell types. The authors note an increase in the number of rare endothelial cell types recovered, however this is not translated into any kind of functional analysis of these populations. 

      We acknowledge the reviewer's comments on the limitations of our study, particularly the lack of extension of our snRNA-seq data into functional studies of new biological processes. However, this manuscript has been submitted as a Tools and Resources article. As an article of this type, we provide detailed information on our snRNA-seq methods and present a valuable resource of high-quality mouse adipose tissue snRNA-seq data. In addition, we demonstrate that our improved method offers novel biological insights, including the identification of subpopulations of adipocytes categorized by size and functionality. We believe this study offers powerful tools and significant value to the research community.

      Figure 4: The authors did not provide any evidence that the relative fluorescent brightness of GFP and mCherry is a direct measure of the nuclear size, and the nuclear size is only a moderate correlation with the cell size. Thus sorting the nuclei based on GFP/mCherry brightness is not a great proxy for adipocyte diameter. Furthermore, no meaningful insights are provided about the functional significance of the reported transcriptional differences between small and large adipocyte nuclei. 

      To address the reviewer's point, we analyzed the Pearson correlation coefficient for nucleus size vs. adipocyte size and found R = 0.85, indicating a strong positive correlation. In addition, we performed a new experiment to determine the correlation between nuclear GFP intensity and adipocyte nucleus size, finding a strong correlation with R = 0.91. These results suggest that nuclear GFP intensity can be a strong proxy for adipocyte size. Furthermore, we performed gene ontology analysis on genes differentially regulated between large and small adipocyte nuclei. We found that large adipocytes promote processes involved in insulin response, vascularization and DNA repair, while inhibiting processes related to cell migration, metabolism and the cytoskeleton. We have added these new data as Figure 4E, S6E, S6G, and S6H (page 11)

      Figure 5-6: The Ad6 population is highly transcriptionally analogous to the mAd3 population from Emont et al, and is thus not a novel finding. Furthermore, in the present data set, the authors conclude that Ad6 are likely stressed/dying hypertrophic adipocytes with a global loss of gene expression, which is a well-documented finding in eWAT > iWAT, for which the snRNA-seq reported in the present manuscript does not provide any novel scientific insight. 

      As the reviewer pointed out, a new analysis integrating our data with the previous study found that Ad3 from our study is comparable to mAd3 from Emont et al. in gene expression profiles. However, significant discrepancies in population size and changes in response to obesity were observed, likely due to differences in technical robustness. The dysfunctional cellular state of this population, with compromised RNA content, may have hindered accurate capture in the previous study, while our protocol enabled precise detection. This underscores the importance of our improved snRNA-seq protocol for accurately understanding adipocyte population dynamics. We have revised the manuscript to include new data in Figure S7 (page 14).

      Reviewer #3 (Public Review): 

      Summary:  

      The authors aimed to improve single-nucleus RNA sequencing (snRNA-seq) to address current limitations and challenges with nuclei and RNA isolation quality. They successfully developed a protocol that enhances RNA preservation and yields high-quality snRNA-seq data from multiple tissues, including a challenging model of adipose tissue. They then applied this method to eWAT and iWAT from mice fed either a normal or high-fat diet, exploring depot-specific cellular dynamics and gene expression changes during obesity. Their analysis included subclustering of SVF cells and revealed that obesity promotes a transition in APCs from an early to a committed state and induces a pro-inflammatory phenotype in immune cells, particularly in eWAT. In addition to SVF cells, they discovered six adipocyte subpopulations characterized by a gradient of unique gene expression signatures. Interestingly, a novel subpopulation, termed Ad6, comprised stressed and dying adipocytes with reduced transcriptional activity, primarily found in eWAT of mice on a high-fat diet. Overall, the methodology is sound, the writing is clear, and the conclusions drawn are supported by the data presented. Further research based on these findings could pave the way for potential novel interventions in obesity and metabolic disorders, or for similar studies in other tissues or conditions. 

      Strengths:  

      • The authors developed a robust snRNA-seq technique that preserves the integrity of the nucleus and RNA across various tissue types, overcoming the challenges of existing methods. 

      • They identified adipocyte subpopulations that follow adaptive or pathological trajectories during obesity. 

      • The study reveals depot-specific differences in adipose tissues, which could have implications for targeted therapies. 

      Weaknesses: 

      • The adipose tissues were collected after 10 weeks of high-fat diet treatment, lacking the intermediate time points for identifying early markers or cell populations during the transition from healthy to pathological adipose tissue. 

      We agree with the reviewers regarding the limitations of our study. To address the reviewer’s comment, we revised the manuscript to include this in the Discussion section (page 17).  

      • The expansion of the Ad6 subpopulation in obese iWAT and gWAT is interesting. The author claims that Ad6 exhibited a substantial increase in eWAT and a moderate rise in iWAT (Figure 4C). However, this adipocyte subpopulation remains the most altered in iWAT upon obesity. Could the authors elaborate on why there is a scarcity of adipocytes with ROS reporter and B2M in obese iWAT?

      We observed an increase in the levels of H2DCFA reporter and B2M protein fluorescence in adipocytes from iWAT of HFD-fed mice, although this increase was much less compared to eWAT, as shown in Figure 6B (left panel). These increases in iWAT were not sufficient for most cells to exceed the cutoff values used to determine H2DCFA and B2M positivity in adipocytes during quantitative analysis. We have revised the manuscript to clarify these results (page 13).

      • While the study provides extensive data on mouse models, the potential translation of these findings to human obesity remains uncertain. 

      To address the reviewer’s point, we expanded our discussion on the differences in adipocyte heterogeneity between mice and humans. We attempted to identify human adipocyte subclusters that resemble the metabolically unhealthy Ad6 adipocytes found in mice in our study; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested points to address: 

      (1) The authors suggest that their improved protocol for maintaining RNA/nucleus integrity results in a more comprehensive analysis of adipose tissue heterogeneity. The authors compare the quality of their snRNA-seq data to those generated in prior studies (e.g., Savari et al.). What is not clear is whether additional heterogeneity/clusters can be observed due directly to the protocol modifications. A direct head-to-head comparison of the protocols executed in parallel would of course be ideal; however, integrating their new dataset with the corresponding data from Savari et al. could help address this question and help readers understand the benefits of this new protocol vs. existing protocols. 

      The data from Savari et al. are of significantly lower quality, likely because they were generated using earlier versions of the 10X Genomics system, and this study lacks iWAT data. To address the reviewer’s point, we instead integrated our data with those from the other study by Emont et al. (2022), which used comparable tissue types and experimental systems. The integrated analysis confirmed the improved representation of all cell types present in adipose tissues in our study, with higher quality metrics such as increased Unique Molecular Identifiers (UMIs) and the number of genes per nucleus. These results indicate that our protocol offers significant advantages in generating a more accurate representation of each cell type and their gene expression profiles. New data are included in Figure S2 (page 7).

      (2) The exact frequency of the Ad6 population in eWAT of mice maintained on HFD is a little unclear. From the snRNA-seq data, it appears that roughly 47% of the adipocytes are in this "stressed state." In Figure 6, it appears that greater than 75% of the adipocytes express B2M (Ad6 marker) and greater than 75% of adipocytes are suggested to be devoid of measurable PPARg expression. The latter seems quite high as PPARg expression is essential to maintain the adipocyte phenotype. Is there evidence of de-differentiation amongst them (i.e. acquisition of progenitor cell markers)? Presenting separate UMAPs for the chow vs. HFD state may help visualize the frequency of each adipocyte population in the two states. Inclusion of the stromal/progenitor cells in the visualization may help understand if cells are de-differentiating in obesity as previously postulated by the authors. Related to Point # 1 above, is this population observed in prior studies and at a similar frequency?

      To address the reviewer’s point, we analyzed the expression of adipocyte progenitor cell (APC) markers, such as Pdgfra, in the Ad6 population. We did not detect significant expression of APC markers, suggesting that Ad6 does not represent dedifferentiating adipocytes. Instead, they are likely stressed and dying cells characterized by an aberrant state of transcription with a global decline.

      When integrating our data with the datasets by Emont et al., we observed an adipocyte population in the previous study, mAd3, comparable to Ad6 in our study, with similar marker gene expression and lower transcript abundance. However, the population size of mAd3 was much smaller than that of Ad6 in our data and did not show consistent population changes during obesity. This discrepancy may be due to different technical robustness; the dysfunctional cellular state of this population, with its severely compromised RNA contents, may have made it difficult to accurately capture using standard protocols in the previous study, while our protocol enabled robust and precise detection. We added new data in Figure S6I and S7 (page 14) and revised the Discussion (page 17).

      Additional points  

      (1) The authors should be cautious in describing subpopulations as "increasing" or "decreasing" in obesity as the data are presented as proportions of a parent population. A given cell population may be "relatively increased." 

      To address the reviewer's point, we revised the manuscript to clarify the "relative" changes in cell populations during obesity in the relevant sections (pages 8, 9, 10, 11, and 15).

      (2) The authors should also be cautious in ascribing "function" to adipocyte populations based solely on their expression signatures. Statements such as those in the abstract, "...providing novel insights into the mechanisms orchestrating adipose tissue remodeling during obesity..." should probably be toned down as no such mechanism is truly demonstrated. 

      To address the reviewer's point, we revised the manuscript by removing or replacing the indicated terms or phrases with more suitable wording in the appropriate sections (page 2, 10, 12, 14)

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors might consider expanding a discussion on the potential implications of their findings, especially the newly identified adipocyte subpopulations and depot-specific differences for human studies. 

      To address the reviewer’s point, we attempted to identify human adipocyte subclusters that resembled our dysfunctional Ad6 adipocytes in mice; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17)

      (2) typo: "To generate diet-induced obesity models". 

      We revised the manuscript to correct it.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors examined the hypothesis that plasma ApoM, which carries sphingosine-1-phosphate (S1P) and activates vascular S1P receptors to inhibit vascular leakage, is modulated by SGLT2 inhibitors (SGLTi) during endotoxemia. They also propose that this mechanism is mediated by SGLTi regulation of LRP2/ megalin in the kidney and that this mechanism is critical for endotoxin-induced vascular leak and myocardial dysfunction. The hypothesis is novel and potentially exciting. However, the author's experiments lack critical controls, lack rigor in multiple aspects, and overall does not support the conclusions.

      Thank you for these comments. We have now directly addressed this hypothesis by using proximal tubule-specific inducible megalin/Lrp2 knockout mice, which remains an innovative hypothesis about how SGLT2i can reduce vascular leak.

      Reviewer #2 (Public Review):

      Apolipoprotein M (ApoM) is a plasma carrier for the vascular protective lipid mediator sphingosine 1-phospate (S1P). The plasma levels of S1P and its chaperones ApoM and albumin rapidly decline in patients with severe sepsis, but the mechanisms for such reductions and their consequences for cardiovascular health remain elusive. In this study, Ripoll and colleagues demonstrate that the sodium-glucose co-transporter inhibitor dapagliflozin (Dapa) can preserve serum ApoM levels as well as cardiac function after LPS treatment of mice with diet-induced obesity. They further provide data to suggest that Dapa preserves serum ApoM by increasing megalin-mediated reabsorption of ApoM in renal proximal tubules and that ApoM improves vascular integrity in LPS treated mice. These observations put forward a potential therapeutic approach to sustain vascular protective S1P signaling that could be relevant to other conditions of systemic inflammation where plasma levels of S1P decrease. However, although the authors are careful with their statements, the study falls short of directly implicating megalin in ApoM reabsorption and of ApoM/S1P depletion in LPS-induced cardiac dysfunction and the protective effects of Dapa.

      The observations reported in this study are exciting and potentially of broad interest. The paper is well written and concise, and the statements made are mostly supported by the data presented. However, the mechanism proposed and implied is mostly based on circumstantial evidence, and the paper could be substantially improved by directly addressing the role of megalin in ApoM reabsorption and serum ApoM and S1P levels and the importance of ApoM for the preservation for cardiac function during endotoxemia. Some observations that are not necessarily in line with the model proposed should also be discussed.

      The authors show that Dapa preserves serum ApoM and cardiac function in LPS-treated obese mice. However, the evidence they provide to suggest that ApoM may be implicated in the protective effect of Dapa on cardiac function is indirect. Direct evidence could be sought by addressing the effect of Dapa on cardiac function in LPS treated ApoM deficient and littermate control mice (with DIO if necessary).

      The authors also suggest that higher ApoM levels in mice treated with Dapa and LPS reflect increased megalin-mediated ApoM reabsorption and that this preserves S1PR signaling. This could be addressed more directly by assessing the clearance of labelled ApoM, by addressing the impact of megalin inhibition or deficiency on ApoM clearance in this context, and by measuring S1P as well as ApoM in serum samples.

      Methods: More details should be provided in the manuscript for how ApoM deficient and transgenic mice were generated, on sex and strain background, and on whether or not littermate controls were used. For intravital microscopy, more precision is needed on how vessel borders were outland and if this was done with or without regard for FITC-dextran. Please also specify the type of vessel chosen and considerations made with regard to blood flow and patency of the vessels analyzed. For statistical analyses, data from each mouse should be pooled before performing statistical comparisons. The criteria used for choice of test should be outlined as different statistical tests are used for similar datasets. For all data, please be consistent in the use of post-tests and in the presentation of comparisons. In other words, if the authors choose to only display test results for groups that are significantly different, this should be done in all cases. And if comparisons are made between all groups, this should be done in all cases for similar sets of data.

      Thank you for these comments. We have now tested the direct role of Lrp2 with respect to SGLT2i in vivo and in vitro, and our study now shows that Lrp2 is required for the effect of dapagliflozin on ApoM. ApoM deficient and transgenic mice were previously described and published by our group (PMID: 37034289) and others (PMID: 24318881), and littermate controls were used throughout our manuscript. We agree that the effect on cardiac function is likely indirect in these models, and as yet we do not have the tools in the LPS model to separate potential endothelial protective vs cardiac effects. In addition, since the ApoM knockout has multiple abnormalities that include hypertension, secondary cardiac hypertrophy, and an adipose/browning phenotype, all of which may influence its response to Dapa in terms of cardiac function, these studies will be challenging to perform and will require additional models that are beyond the scope of this manuscript.

      For intravital microscopy, vessel borders were outlined blindly without regard for FITC-dextran. We believe it is important to show multiple blood vessels per mouse since, as the reviewer points out, there is quite a bit of vessel heterogeneity. These tests were performed in the collaborator’s laboratory, and data analysis was blinded, and the collaborator was unaware of the study hypothesis at the time the measurements were performed and analyzed. They have previously reported this is a valid method to show cremaster vessel permeability (PMID: 26839042).

      We have updated our methods section and updated the figure legends to clearly indicate the statistical analyses we used. For 2 group comparison we used student’s t-test, and for multiple groups one-way ANOVA with Sidak's correction for multiple comparisons was used throughout the paper when the data are normally distributed, and Kruskal-Wallis was used when the data are not normally distributed.

      Reviewer #3 (Public Review):

      The authors have performed well designed experiments that elucidate the protective role of Dapa in sepsis model of LPS. This model shows that Dapa works, in part, by increasing expression of the receptor LRP2 in the kidney, that maintains circulating ApoM levels. ApoM binds to S1P which then interacts with the S1P receptor stimulating cardiac function, epithelial and endothelial barrier function, thereby maintaining intravascular volume and cardiac output in the setting of severe inflammation. The authors used many experimental models, including transgenic mice, as well as several rigorous and reproducible techniques to measure the relevant parameters of cardiac, renal, vascular, and immune function. Furthermore, they employ a useful inhibitor of S1P function to show pharmacologically the essential role for this agonist in most but not all the benefits of Dapa. A strength of the paper is the identification of the pathway responsible for the cardioprotective effects of SGLT2is that may yield additional therapeutic targets. There are some weaknesses in the paper, such as, studying only male mice, as well as providing a power analysis to justify the number of animals used throughout their experimentation. Overall, the paper should have a significant impact on the scientific community because the SGLT2i drugs are likely to find many uses in inflammatory diseases and metabolic diseases. This paper provides support for an important mechanism by which they work in conditions of severe sepsis and hemodynamic compromise.

      Thank you for these comments.

    1. Author response:

      Reviewer #1 (Public Review):

      This paper proposes a novel framework for explaining patterns of generalization of force field learning to novel limb configurations. The paper considers three potential coordinate systems: cartesian, joint-based, and object-based. The authors propose a model in which the forces predicted under these different coordinate frames are combined according to the expected variability of produced forces. The authors show, across a range of changes in arm configurations, that the generalization of a specific force field is quite well accounted for by the model.

      The paper is well-written and the experimental data are very clear. The patterns of generalization exhibited by participants - the key aspect of the behavior that the model seeks to explain - are clear and consistent across participants. The paper clearly illustrates the importance of considering multiple coordinate frames for generalization, building on previous work by Berniker and colleagues (JNeurophys, 2014). The specific model proposed in this paper is parsimonious, but there remain a number of questions about its conceptual premises and the extent to which its predictions improve upon alternative models.

      A major concern is with the model's premise. It is loosely inspired by cue integration theory but is really proposed in a fairly ad hoc manner, and not really concretely founded on firm underlying principles. It's by no means clear that the logic from cue integration can be extrapolated to the case of combining different possible patterns of generalization. I think there may in fact be a fundamental problem in treating this control problem as a cue-integration problem. In classic cue integration theory, the various cues are assumed to be independent observations of a single underlying variable. In this generalization setting, however, the different generalization patterns are NOT independent; if one is true, then the others must inevitably not be. For this reason, I don't believe that the proposed model can really be thought of as a normative or rational model (hence why I describe it as 'ad hoc'). That's not to say it may not ultimately be correct, but I think the conceptual justification for the model needs to be laid out much more clearly, rather than simply by alluding to cue-integration theory and using terms like 'reliability' throughout.

      We thank the reviewer for bringing up this point. We see and treat this problem of finding the combination weights not as a cue integration problem but as an inverse optimal control problem. In this case, there can be several solutions to the same problem, i.e., what forces are expected in untrained areas, which can co-exist and give the motor system the option to switch or combine them. This is similar to other inverse optimal control problems, e.g. combining feedforward optimal control models to explain simple reaching. However, compared to these problems, which fit the weights between different models, we proposed an explanation for the underlying principle that sets these weights for the dynamics representation problem. We found that basing the combination on each motor plan's reliability can best explain the results. In this case, we refer to ‘reliability’ as execution reliability and not sensory reliability, which is common in cue integration theory. We have added further details explaining this in the manuscript.

      “We hypothesize that this inconsistency in results can be explained using a framework inspired by an inverse optimal control framework. In this framework the motor system can switch or combine between different solutions. That is, the motor system assigns different weights to each solution and calculates a weighted sum of these solutions. Usually, to support such a framework, previous studies found the weights by fitting the weighed sum solution to behavioral data (Berret, Chiovetto et al. 2011). While we treat the problem in the same manner, we propose the Reliable Dynamics Representation (Re-Dyn) mechanism that determines the weights instead of fitting them. According to our framework, the weights are calculated by considering the reliability of each representation during dynamic generalization. That is, the motor system prefers certain representations if the execution of forces based on this representation is more robust to distortion arising from neural noise. In this process, the motor system estimates the difference between the desired generalized forces and generated generalized forces while taking into consideration noise added to the state variables that equivalently define the forces.”

      A more rational model might be based on Bayesian decision theory. Under such a model, the motor system would select motor commands that minimize some expected loss, averaging over the various possible underlying 'true' coordinate systems in which to generalize. It's not entirely clear without developing the theory a bit exactly how the proposed noise-based theory might deviate from such a Bayesian model. But the paper should more clearly explain the principles/assumptions of the proposed noise-based model and should emphasize how the model parallels (or deviates from) Bayesian-decision-theory-type models.

      As we understand the reviewer's suggestion, the idea is to estimate the weight of each coordinate system based on minimizing a loss function that considers the cost of each weight multiplied by a posterior probability that represents the uncertainty in this weight value. While this is an interesting idea, we believe that in the current problem, there are no ‘true’ weight values. That is, the motor system can use any combination of weights which will be true due to the ambiguous nature of the environment. Since the force field was presented in one area of the entire workspace, there is no observation that will allow us to update prior beliefs regarding the force nature of the environment. In such a case, the prior beliefs might play a role in the loss function, but in our opinion, there is no clear rationale for choosing unequal priors except guessing or fitting prior probabilities, which will resemble any other previous models that used fitting rather than predictions.

      Another significant weakness is that it's not clear how closely the weighting of the different coordinate frames needs to match the model predictions in order to recover the observed generalization patterns. Given that the weighting for a given movement direction is over- parametrized (i.e. there are 3 variable weights (allowing for decay) predicting a single observed force level, it seems that a broad range of models could generate a reasonable prediction. It would be helpful to compare the predictions using the weighting suggested by the model with the predictions using alternative weightings, e.g. a uniform weighting, or the weighting for a different posture. In fact, Fig. 7 shows that uniform weighting accounts for the data just as well as the noise-based model in which the weighting varies substantially across directions. A more comprehensive analysis comparing the proposed noise-based weightings to alternative weightings would be helpful to more convincingly argue for the specificity of the noise-based predictions being necessary. The analysis in the appendix was not that clearly described, but seemed to compare various potential fitted mixtures of coordinate frames, but did not compare these to the noise-based model predictions.

      We agree with the reviewer that fitted global weights, that is, an optimal weighted average of the three coordinate systems should outperform most of the models that are based on prediction instead of fitting the data. As we showed in Figure 7 of the submitted version of the manuscript, we used the optimal fitted model to show that our noise-based model is indeed not optimal but can predict the behavioral results and not fall too short of a fitted model. When trying to fit a model across all the reported experiments, we indeed found a set of values that gives equal weights for the joints and object coordinate systems (0.27 for both), and a lower value for the Cartesian coordinate system (0.12). Considering these values, we indeed see how the reviewer can suggest a model that is based on equal weights across all coordinate systems. While this model will not perform as well as the fitted model, it can still generate satisfactory results.

      To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. In this experiment, a model that is based on global fitted weights can only predict one out of two possible generalization patterns while models that are based on individual direction-predicted weights can predict a variety of generalization patterns. We show that global weights, although fitted to the data, cannot explain participants' behavior. We report these new results in Appendix 2.

      “To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. We used the idea of experiment 3 in which participants generalize learned dynamics using a tool. That is, the arm posture does not change between the training and test areas. In such a case, the Cartesian and joint coordinate systems do not predict a shift in generalized force pattern while the object coordinate system predicts a shift that depends on the orientation of the tool. In this additional experiment, we set a test workspace in which the orientation of the tool is 90° (Appendix 2- figure 1A). In this case, for the test workspace, the force compensation pattern of the object based coordinate system is in anti-phase with the Cartesian/joint generalization pattern. Any globally fitted weights (including equal weights) can produce either a non-shifted or 90° shifted force compensation pattern (Appendix 2- figure 1B). Participants in this experiment (n=7) showed similar MPE reduction as in all previous experiments when adapting to the trigonometric scaled force field (Appendix 2- figure 1C). When examining the generalized force compensation patterns, we observed a shift of the pattern in the test workspace of 14.6° (Appendix 2- figure 1D). This cannot be explained by the individual coordinate system force compensation patterns or any combination of them (which will always predict either a 0° or 90° shift, Appendix 2- figure 1E). However, calculating the prediction of the Re-Dyn model we found a predicted force compensation pattern with a shift of 6.4° (Appendix 2- figure 1F). The intermediate shift in the force compensation pattern suggests that any global based weights cannot explain the results.”

      With regard to the suggestion that weighting is changed according to arm posture, two of our results lower the possibility that posture governs the weights:

      (1) In experiment 3, we tested generalization while keeping the same arm posture between the training and test workspaces, and we observed different force compensation profiles across the movement directions. If arm posture in the test workspaces affected the weights, we would expect identical weights for both test workspaces. However, any set of weights that can explain the results observed for workspace 1 will fail to explain the results observed in workspace 2. To better understand this point we calculated the global weights for each test workspace for this experiment and we observed an increase in the weight for the object coordinates system (0.41 vs. 0.5) and a reduction in the weights for the Cartesian and joint coordinates systems (0.29 vs. 0.24). This suggests that the arm posture cannot explain the generalization pattern in this case.

      (2) In experiments 2 and 3, we used the same arm posture in the training workspace and either changed the arm posture (experiment 2) or did not change the arm posture (experiment 3) in the test workspaces. While the arm posture for the training workspace was the same, the force generalization patterns were different between the two experiments, suggesting that the arm posture during the training phase (adaptation) does not set the generalization weights.

      Overall, this shows that it is not specifically the arm posture in either the test or the training workspaces that set the weights. Of course, all coordinate models, including our noise model, will consider posture in the determination of the weights.

      Reviewer #2 (Public Review):

      Leib & Franklin assessed how the adaptation of intersegmental dynamics of the arm generalizes to changes in different factors: areas of extrinsic space, limb configurations, and 'object-based' coordinates. Participants reached in many different directions around 360{degree sign}, adapting to velocity-dependent curl fields that varied depending on the reach angle. This learning was measured via the pattern of forces expressed in upon the channel wall of "error clamps" that were randomly sampled from each of these different directions. The authors employed a clever method to predict how this pattern of forces should change if the set of targets was moved around the workspace. Some sets of locations resulted in a large change in joint angles or object-based coordinates, but Cartesian coordinates were always the same. Across three separate experiments, the observed shifts in the generalized force pattern never corresponded to a change that was made relative to any one reference frame. Instead, the authors found that the observed pattern of forces could be explained by a weighted combination of the change in Cartesian, joint, and object-based coordinates across test and training contexts.

      In general, I believe the authors make a good argument for this specific mixed weighting of different contexts. I have a few questions that I hope are easily addressed.

      Movements show different biases relative to the reach direction. Although very similar across people, this function of biases shifts when the arm is moved around the workspace (Ghilardi, Gordon, and Ghez, 1995). The origin of these biases is thought to arise from several factors that would change across the different test and training workspaces employed here (Vindras & Viviani, 2005). My concern is that the baseline biases in these different contexts are different and that rather the observed change in the force pattern across contexts isn't a function of generalization, but a change in underlying biases. Baseline force channel measurements were taken in the different workspace locations and conditions, so these could be used to show whether such biases are meaningfully affecting the results.

      We agree with the reviewer and we followed their suggested analysis. In the following figure (Author response image 1) we plotted the baseline force compensation profiles in each workspace for each of the four experiments. As can be seen in this figure, the baseline force compensation is very close to zero and differs significantly from the force compensation profiles after adaptation to the scaled force field.

      Author response image 1.

      Baseline force compensation levels for experiments 1-4. For each experiment, we plotted the force compensation for the training, test 1, and test 2 workspaces.

      Experiment 3, Test 1 has data that seems the worst fit with the overall story. I thought this might be an issue, but this is also the test set for a potentially awkwardly long arm. My understanding of the object-based coordinate system is that it's primarily a function of the wrist angle, or perceived angle, so I am a little confused why the length of this stick is also different across the conditions instead of just a different angle. Could the length be why this data looks a little odd?

      Usually, force generalization is tested by physically moving the hand in unexplored areas. In experiment 3 we tested generalization using a tool which, as far as we know, was not tested in the past in a similar way to the present experiment. Indeed, the results look odd compared to the results of the other experiments, which were based on the ‘classic’ generalization idea. While we have some ideas regarding possible reasons for the observed behavior, it is out of the scope of the current work and still needs further examination.

      Based on the reviewer’s comment, we improved the explanation in the introduction regarding the idea behind the object based coordinate system

      “we could represent the forces as belonging to the hand or a hand-held object using the orientation vector connecting the shoulder and the object or hand in space (Berniker, Franklin et al. 2014).” The reviewer is right in their observation that the predictions of the object-based reference frame will look the same if we change the length of the tool. The object-based generalized forces, specifically the shift in the force pattern, depend only on the object's orientation but not its length (equation 4).

      The manuscript is written and organized in a way that focuses heavily on the noise element of the model. Other than it being reasonable to add noise to a model, it's not clear to me that the noise is adding anything specific. It seems like the model makes predictions based on how many specific components have been rotated in the different test conditions. I fear I'm just being dense, but it would be helpful to clarify whether the noise itself (and inverse variance estimation) are critical to why the model weights each reference frame how it does or whether this is just a method for scaling the weight by how much the joints or whatever have changed. It seems clear that this noise model is better than weighting by energy and smoothness.

      We have now included further details of the noise model and added to Figure 1 to highlight how noise can affect the predicted weights. In short, we agree with the reviewer there are multiple ways to add noise to the generalized force patterns. We choose a simple option in which we simulate possible distortions to the state variables that set the direction of movement. Once we calculated the variance of the force profile due to this distortion, one possible way is to combine them using an inverse variance estimator. Note that it has been shown that an inverse variance estimator is an ideal way to combine signals (e.g., Shahar, D.J. (2017) https://doi.org/10.4236/ojs.2017.72017). However, as we suggest, we do not claim or try to provide evidence for this specific way of calculating the weights. Instead, we suggest that giving greater weight to the less variable force representation can predict both the current experimental results as well as past results.

      Are there any force profiles for individual directions that are predicted to change shape substantially across some of these assorted changes in training and test locations (rather than merely being scaled)? If so, this might provide another test of the hypotheses.

      In experiments 1-3, in which there is a large shift of the force compensation curve, we found directions in which the generalized force was flipped in direction. That is, clockwise force profiles in the training workspace could change into counter-clockwise profiles in the test workspace. For example, in experiment 2, for movement at 157.5° we can see that the force profile was clockwise for the training workspace (with a force compensation value of 0.43) and movement at the same direction was counterclockwise for test workspace 1 (force compensation equal to -0.48). Importantly, we found that the noise based model could predict this change.

      Author response image 2.

      Results of experiment 2. Force compensation profiles for the training workspace (grey solid line) and test workspace 1 (dark blue solid line). Examining the force nature for the 157.5° direction, we found a change in the applied force by the participants (change from clockwise to counterclockwise forces). This was supported by a change in force compensation value (0.43 vs. -0.48). The noise based model can predict this change as shown by the predicted force compensation profile (green dashed line).

      I don't believe the decay factor that was used to scale the test functions was specified in the text, although I may have just missed this. It would be a good idea to state what this factor is where relevant in the text.

      We added an equation describing the decay factor (new equation 7 in the Methods section) according to this suggestion and Reviewer 1 comment on the same issue.

      Reviewer #3 (Public Review):

      The author proposed the minimum variance principle in the memory representation in addition to two alternative theories of the minimum energy and the maximum smoothness. The strength of this paper is the matching between the prediction data computed from the explicit equation and the behavioral data taken in different conditions. The idea of the weighting of multiple coordinate systems is novel and is also able to reconcile a debate in previous literature.

      The weakness is that although each model is based on an optimization principle, but the derivation process is not written in the method section. The authors did not write about how they can derive these weighting factors from these computational principles. Thus, it is not clear whether these weighting factors are relevant to these theories or just hacking methods. Suppose the author argues that this is the result of the minimum variance principle. In that case, the authors should show a process of how to derive these weighting factors as a result of the optimization process to minimize these cost functions.

      The reviewer brings up a very important point regarding the model. As shown below, it is not trivial to derive these weights using an analytical optimization process. We demonstrate one issue with this optimization process.

      The force representation can be written as (similar to equation 6):

      We formulated the problem as minimizing the variance of the force according to the weights w:

      In this case, the variance of the force is the variance-covariance matrix which can be minimized by minimizing the matrix trace:

      We will start by calculating the variance of the force representation in joints coordinate system:

      Here, the force variance is a result of a complex function which include the joints angle as a random variable. Expending the last expression, although very complex, is still possible. In the resulted expression, some of the resulted terms include calculating the variance of nested trigonometric functions of the random joint angle variance, for example:

      In the vast majority of these cases, analytical solutions do not exist. Similar issues can also raise for calculating the variance of complex multiplication of trigonometric functions such as in the case of multiplication of Jacobians (and inverse Jacobians)

      To overcome this problem, we turned to numerical solutions which simulate the variance due to the different state variables.

      In addition, I am concerned that the proposed model can cancel the property of the coordinate system by the predicted variance, and it can work for any coordinate system, even one that is not used in the human brain. When the applied force is given in Cartesian coordinates, the directionality in the generalization ability of the memory of the force field is characterized by the kinematic relationship (Jacobian) between the Cartesian coordinate and the coordinate of interest (Cartesian, joint, and object) as shown in Equation 3. At the same time, when a displacement (epsilon) is considered in a space and a corresponding displacement is linked with kinematic equations (e.g., joint displacement and hand displacement in 2 joint arms in this paper), the generated variances in different coordinate systems are linked with the kinematic equation each other (Jacobian). Thus, how a small noise in a certain coordinate system generates the hand force noise (sigma_x, sigma_j, sigma_o) is also characterized by the kinematics (Jacobian). Thus, when the predicted forcefield (F_c, F_j, F_o) was divided by the variance (F_c/sigma_c^2, F_j/sigma_j^2, F_o/sigma_o^2, ), the directionality of the generalization force which is characterized by the Jacobian is canceled by the directionality of the sigmas which is characterized by the Jacobian. Thus, as it has been read out from Fig*D and E top, the weight in E-top of each coordinate system is always the inverse of the shift of force from the test force by which the directionality of the generalization is always canceled.

      Once this directionality is canceled, no matter how to compute the weighted sum, it can replicate the memorized force. Thus, this model always works to replicate the test force no matter which coordinate system is assumed. Thus, I am suspicious of the falsifiability of this computational model. This model is always true no matter which coordinate system is assumed. Even though they use, for instance, the robot coordinate system, which is directly linked to the participant's hand with the kinematic equation (Jacobian), they can replicate this result. But in this case, the model would be nonsense. The falsifiability of this model was not explicitly written.

      As explained above, calculating the variability of the generalized forces given the random nature of the state variable is a complex function that is not summarized using a Jacobian. Importantly the model is unable to reproduce or replicate the test force arbitrarily. In fact, we have already shown this (see Appendix 1- figure 1), where when we only attempt to explain the data with either a single coordinate system (or a combination of two coordinate systems) we are completely unable to replicate the test data despite using this model. For example, in experiment 4, when we don’t use the joint based coordinate system, the model predicts zero shift of the force compensation pattern while the behavioral data show a shift due to the contribution of the joint coordinate system. Any arbitrary model (similar to the random model we tested, please see the response to Reviewer 1) would be completely unable to recreate the test data. Our model instead makes very specific predictions about the weighting between the three coordinate systems and therefore completely specified force predictions for every possible test posture. We added this point to the Discussion

      “The results we present here support the idea that the motor system can use multiple representations during adaptation to novel dynamics. Specifically, we suggested that we combine three types of coordinate systems, where each is independent of the other (see Appendix 1- figure 1 for comparison with other combinations). Other combinations that include a single or two coordinate system can explain some of the results but not all of them, suggesting that force representation relies on all three with specific weights that change between generalization scenarios.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers' 1 and 2 concern on endothelial cells (ECs) transcription changes on culture.

      We have now addressed this concern by FACS-sorting ECs (Fig. 7A revised) and comparing our data with previous studies (S. Fig. 1C). Our major claim was the epigenetic repression of EC genes, including those involved in BBB formation and angiogenesis, during later development. To further strengthen our claim, we knocked out HDAC2 during the later stages of development to prevent this epigenetic repression. As shown in the first version of the manuscript, this knockout results in enhanced angiogenesis and a leaky BBB.

      In the revised version, we have FACS-sorted CD31+ ECs from E-17.5 WT and HDAC2 ECKO mice, followed by ultra-low mRNA sequencing. Confirming the epigenetic repression via HDAC2, the HDAC2-deleted ECs showed high expression of BBB genes such as ZO-1, OCLN, MFSD2A, and GLUT1, and activation of the Wnt signaling pathway as indicated by the upregulation of Wnt target genes such as Axin2 and APCDD1. Additionally, to validate the increased angiogenesis phenotype observed, angiogenesis-related genes such as VEGFA, FLT1, and ENG were upregulated.

      Since the transcriptomics of brain ECs during developmental stages has already been published in Hupe et al., 2017, we did not attempt to replicate this. However, we compared our differentially regulated genes from E-13.5 versus adult stages with the transcriptome changes during development reported by Hupe et al., 2017. We found a significant overlap in important genes such as CLDN5, LEF1, ZIC3, and MFSD2A (S. Fig. 1C).

      As pointed out by the reviewer, culture-induced changes cannot be ruled out from our data. We have included a statement in the manuscript: "Even though we used similar culture conditions for both embryonic and adult cortical ECs, culture-induced changes have been reported previously and should be considered as a varying factor when interpreting our results."

      Reviewer-1 Comment 2- An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of the knockdown.

      We have now provided the protein expression data for HDAC2 and EZH2 in the revised manuscript Supplementary Figure- 2A.

      Reviewer-1 Comment 3- Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

      We appreciate your positive comment. The in vivo HDAC2 knockout experiment serves as a validation of our in vitro findings, demonstrating that the epigenetic regulator HDAC2 can control the expression of endothelial cell (EC) genes involved in angiogenesis, blood-brain barrier (BBB) formation, and maturation. To investigate the mechanism behind the underlying phenotype of HDAC2 ECKO, we performed mRNA sequencing on HDAC2 ECKO E-17.5 ECs and discovered that vascular and BBB maturation is hindered by preventing the epigenetic repression of BBB, angiogenesis, and Wnt target genes (Fig. 7A). As a result, the HDAC2 ECKO phenotype showed increased angiogenesis and BBB leakage. This strengthens our hypothesis that HDAC2-mediated epigenetic repression is critical for BBB and vascular maturation.

      Reviewer 2 Comment-2 The use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      We appreciate the reviewer's comment. While whole-genome methods like ChIP-seq offer comprehensive and high-throughput data, ChIP-qPCR assays remain valuable tools due to their sensitivity, specificity, and suitability for validation and targeted analysis. Our ChIP analysis identify the crucial roles of HDAC2 and PRC2, two epigenetic enzymes, in CNS endothelial cells (ECs). In vivo data presented in Figure 4 further support this finding through observed phenotypic differences. We concur that a comprehensive analysis of HDAC2 and PRC2 target genes in ECs is essential. A comprehensive analysis of HDAC2 and PRC2 target genes in ECs is currently underway and will be the subject of a separate publication due to the extensive nature of the data.

      Reviewer 2 Comment-3 Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation in a wide variety of cell types.

      We appreciate the comments from the reviewers. Our results provide valuable insights into the specific epigenetic mechanisms that regulate BBB genes It is important to recognize that different cell types possess stage-specific distinct epigenetic landscapes and regulatory mechanisms. Rather than having broad roles across diverse cell types, it is more likely that HDAC2 (eventhough there are several other class and subtypes of HDACs) and the Polycomb complex exhibit specific functions within the context of EC gene expression or BBB integrity.

      Moreover, the significance of our findings is enhanced by the fact that epigenetic modifications are often reversible with the assistance of epigenetic regulators. This makes them promising targets for BBB modulation. Targeting epigenetic regulators can have a widespread impact, as these mechanisms regulate numerous genes that collectively have the potential to promote the vascular repair.

      A practical advantage is that FDA-approved HDAC2 inhibitors, as well as PRC2 inhibitors (such as those mentioned in clinical trials NCT03211988 and NCT02601950, are already available. This facilitates the repurposing of drugs and expedites their potential for clinical translation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    1. Author response:

      Reviewer #1 (Public Review):

      Padilha et al. aimed to find prospective metabolite biomarkers in serum of children aged 6-59 months that were indicative of neurodevelopmental outcomes. The authors leveraged data and samples from the cross-sectional Brazilian National Survey on Child Nutrition (ENANI-2019), and an untargeted multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) approach was used to measure metabolites in serum samples (n=5004) which were identified via a large library of standards. After correlating the metabolite levels against the developmental quotient (DQ), or the degree of which age-appropriate developmental milestones were achieved as evaluated by the Survey of Well-being of Young Children, serum concentrations of phenylacetylglutamine (PAG), cresol sulfate (CS), hippuric acid (HA) and trimethylamine-N-oxide (TMAO) were significantly negatively associated with DQ. Examination of the covariates revealed that the negative associations of PAG, HA, TMAO and valine (Val) with DQ were specific to younger children (-1 SD or 19 months old), whereas creatinine (Crtn) and methylhistidine (MeHis) had significant associations with DQ that changed direction with age (negative at -1 SD or 19 months old, and positive at +1 SD or 49 months old). Further, mediation analysis demonstrated that PAG was a significant mediator for the relationship of delivery mode, child's diet quality and child fiber intake with DQ. HA and TMAO were additional significant mediators of the relationship of child fiber intake with DQ.

      Strengths of this study include the large cohort size and study design allowing for sampling at multiple time points along with neurodevelopmental assessment and a relatively detailed collection of potential confounding factors including diet. The untargeted metabolomics approach was also robust and comprehensive allowing for level 1 identification of a wide breadth of potential biomarkers. Given their methodology, the authors should be able to achieve their aim of identifying candidate serum biomarkers of neurodevelopment for early childhood. The results of this work would be of broad interest to researchers who are interested in understanding the biological underpinnings of development and also for tracking development in pediatric populations, as it provides insight for putative mechanisms and targets from a relevant human cohort that can be probed in future studies. Such putative mechanisms and targets are currently lacking in the field due to challenges in conducting these kind of studies, so this work is important.

      However, in the manuscript's current state, the presentation and analysis of data impede the reader from fully understanding and interpreting the study's findings.

      Particularly, the handling of confounding variables is incomplete. There is a different set of confounders listed in Table 1 versus Supplementary Table 1 versus Methods section Covariates versus Figure 4. For example, Region is listed in Supplementary Table 1 but not in Table 1, and Mode of Delivery is listed in Table 1 but not in Supplementary Table 1. Many factors are listed in Figure 4 that aren't mentioned anywhere else in the paper, such as gestational age at birth or maternal pre-pregnancy obesity.

      We thank the reviewer for their comment. We would like to clarify that initially, the tables had different variables because they have different purposes. Table 1 aims to characterize the sample on variables directly related to the children’s and mother’s features and their nutritional status. Supplementary File 1(previously named supplementary table 1) summarizes the sociodemographic distribution of the development quotient. Neither of the tables concerned the metabolite-DQ relationships and their potential covariates, they only provide context for subsequent analyses by characterizing the sample and the outcome. Instead, the covariates included in the regression models were selected using the Direct Acyclic Graph presented in Figure 1.

      To avoid this potential confusion however, we included the same variables in Table 1 and Supplementary File 1(page 38) and we discussed the selection of model covariates in Figure 4 in more detail here in the letter and in the manuscript.

      The authors utilize the directed acrylic graph (DAG) in Figure 4 to justify the further investigation of certain covariates over others. However, the lack of inclusion of the microbiome in the DAG, especially considering that most of the study findings were microbial-derived metabolite biomarkers, appears to be a fundamental flaw. Sanitation and micronutrients are proposed by the authors to have no effect on the host metabolome, yet sanitation and micronutrients have both been demonstrated in the literature to affect microbiome composition which can in turn affect the host metabolome.

      Thank you for your comment. We appreciate that the use of DAG and lack of the microbiome in the DAG are concerns. This has been already discussed in reply #1 to the editor that has been pasted below for convenience:

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Additionally, the authors emphasized as part of the study selection criteria the following, "Due to the costs involved in the metabolome analysis, it was necessary to further reduce the sample size. Then, samples were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions related to iron metabolism, such as anemia and nutrient deficiencies. The selection process aimed to represent diverse health statuses, including those with no conditions, with specific deficiencies, or with combinations of conditions. Ultimately, through a randomized process that ensured a balanced representation across these groups, a total of 5,004 children were selected for the final sample (Figure 1)."

      Therefore, anemia and nutrient deficiencies are assumed by the reader to be important covariates, yet, the data on the final distribution of these covariates in the study cohort is not presented, nor are these covariates examined further.

      Thank you for the comments. We apologize for the misunderstanding and will amend the text to make our rationale clearer in the revised version of the manuscript.

      We believed the original text was clear enough in stating that the sampling process was performed aiming to maintain the representativeness of the original sample. This sampling process considered anemia and nutritional deficiencies, among other variables. However, we did not aim to include all relevant covariates of the DQ-metabolome relationship; these were decided using the DAG, as described in the manuscript and other sessions of this letter. Therefore, we would like to emphasize that our description of the sampling process does not assumes anemia and nutritional deficiencies are important covariates for the DQ-metabolome relationship.

      We rewrote this text part, page 11, lines 279-285:

      “Due to the costs involved in the metabolome analysis, it was necessary to reduce the sample size that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. Therefore, the infants were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions such as anemia and micronutrient deficiencies. The selection process aimed to represent diverse health statuses to the original sample. Ultimately, 5,004 children were selected for the final sample through a random sampling process that ensured a balanced representation across these groups (Figure 2).”

      The inclusion of specific covariates in Table 1, Supplementary Table 1, the statistical models, and the mediation analysis is thus currently biased as it is not well justified.

      We appreciate the reviewer comment. However, it would have been ideal to receive a comment/critic with a clearer and more straightforward argumentation, so we could try to address it based on our interpretation.

      Please refer to our response to item #1 above regarding the variables in the tables and figures. The covariates in the statistical models were selected using the DAG, which is a cutting-edge procedure that aims to avoid bias and overfitting, a common situation when confounders are adjusted for without a clear rationale. We elaborate on the advantages of using the DAG in response to item #6 and in page 9 of the manuscript. The statistical models we use follow the best practices in the field when dealing with a large number of collinear predictors and a continuous outcome (see our response to the editor’s 4th comment). Finally, the mediation analyses were done to explore a few potential explanations for our results from the PLSR and multiple regression analyses. We only ran mediation analyses for plausible mechanisms for which the variables of interest were available in our data. Please see our response to reviewer 3’s item #1 for a more detailed explanation on the mediation analysis.

      Finally, it is unclear what the partial-least squares regression adds to the paper, other than to discard potentially interesting metabolites found by the initial correlation analysis.

      Thank you for the question. As explained in response to the editor’s item #4, PLS-based analyses are among the most commonly used analyses for parsing metabolomic data (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015). This procedure is especially appropriate for cases in which there are multiple collinear predictor variables as it allows us to compare the predictive value of all the variables without relying on corrections for multiple testing. Testing each metabolite in separate correlations corrected for multiple comparisons is less appropriate because the correlated nature of the metabolites means the comparisons are not truly independent and would cause the corrections (which usually assume independence) to be overly strict. As such, we only rely on the correlations as an initial, general assessment that gives context to subsequent, more specific analyses. Given that our goal is to select the most predictive metabolites, discarding the less predictive metabolites is precisely what we aim to achieve. As explained above and in response to the editor’s item #4, the PLSR allows us to reach that goal without introducing bias in our estimates or losing statistical power.  

      Reviewer #2 (Public Review):

      A strength of the work lies in the number of children Padilha et al. were able to assess (5,004 children aged 6-59 months) and in the extensive screening that the Authors performed for each participant. This type of large-scale study is uncommon in low-to-middle-income countries such as Brazil.

      The Authors employ several approaches to narrow down the number of potentially causally associated metabolites.

      Could the Authors justify on what basis the minimum dietary diversity score was dichotomized? Were sensitivity analyses undertaken to assess the effect of this dichotomization on associations reported by the article? Consumption of each food group may have a differential effect that is obscured by this dichotomization.

      Thank you for the observation. We would like to emphasize that the child's diet quality was assessed using the minimum dietary diversity (MDD) indicator proposed by the WHO (World Health Organization & United Nations Children’s Fund (UNICEF), 2021). This guideline proposes the cutoff used in the present study. We understand the reviewer’s suggestion to use the consumption of healthy food groups as an evaluation of diet quality, but we chose to follow the WHO proposal to assess dietary diversity. This indicator is widely accepted and used as a marker and provides comparability and consistency with other published studies.

      Could the Authors specify the statistical power associated with each analysis?

      To the best of our knowledge, we are not aware of power calculation procedures for PLS-based analyses. However, given our large sample size, we do not believe power was an issue with the analyses. For our regression analyses, which typically have 4 predictors, we had 95% power to detect an f-squared of 0.003 and an r of 0.05 in a two-sided correlation test considering an alpha level of 0.05.

      New text, page 11, lines 296-298:

      “Given the size of our sample, statistical power is not an issue in our analyses. Considering an alpha of 0.05 for a two-sided test, a sample size of 5000 has 95% power to detect a correlation of r = 0.05 and an effect of f2 = 0.003 in a multiple regression model with 4 predictors.”

      Could the Authors describe in detail which metric they used to measure how predictive PLSR models are, and how they determined what the "optimal" number of components were?

      We chose the model with the fewest number of components that maximized R2 and minimized root mean squared error of prediction (RMSEP). In the training data, the model with 4 components had a lower R2 but a lower RMSEP, therefore we chose the model with 3 components which had a higher R2 than the 4-component model and lower RMSEP than the model with 2 components. However, the number of components in the model did not meaningfully change the rank order of the metabolites on the VIP index.

      New text, page 8, lines 220-224:

      “To better assess the predictiveness of each metabolite in a single model, a PLSR was conducted. PLS-based analyses are the most commonly used analyses when determining the predictiveness of a large number of variables as they avoid issues with collinearity, sample size, and corrections for multiple-testing (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015).”

      New text, page 12, lines 312-314:

      “In PLSR analysis, the training data suggested that three components best predicted the data (the model with three components had the highest R2, and the root mean square error of prediction (RMSEP) was only slightly lower with four components). In comparison, the test data showed a slightly more predictive model with four components (Figure 3—figure supplement 2).”

      The Authors use directed acyclic graphs (DAG) to identify confounding variables of the association between metabolites and DQ. Could the dataset generated by the Authors have been used instead? Not all confounding variables identified in the literature may be relevant to the dataset generated by the Authors.

      Thank you for the question. The response is most likely no, the current dataset should not be used to define confounders as these must be identified based on the literature. The use of DAGs has been widely explored as a valid tool for justifying the choice of confounding factors in regression models in epidemiology. This is because DAGs allow for a clear visualization of causal relationships, clarify the complex relationships between exposure and outcome. Besides, DAGs demonstrate the authors' transparency by acknowledging factors reported as important but not included/collected in the study. This has been already discussed in reply #1 to the editor that has been pasted below for convenience.

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Were the systematic reviews or meta-analyses used in the DAG performed by the Authors, or were they based on previous studies? If so, more information about the methodology employed and the studies included should be provided by the Authors.

      Thank you for the question. The reviews or meta-analyses used in the DAG have been conducted by other authors in the field. This has been laid out more clearly in our methods section.

      New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Approximately 72% of children included in the analyses lived in households with a monthly income superior to the Brazilian minimum wage. The cohort is also biased towards households with a higher level of education. Both of these measures correlate with developmental quotient. Could the Authors discuss how this may have affected their results and how generalizable they are?

      Thank you for your comment. This has been already discussed in reply #6 to the editor and that has been pasted below for convenience.

      Thank you for highlighting this point. The ENANI-2019 is a population-based household survey with national coverage and representativeness for macroregions, sex, and one-year age groups (< 1; 1-1.99; 2-2.99; 3-3.99; 4-5). Furthermore, income quartiles of the census sector were used in the sampling. The study included 12,524 households 14,588 children, and 8,829 infants with blood drawn.

      Due to the costs involved in metabolome analysis, it was necessary to further reduce the sample size to around 5,000 children that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. To avoid a biased sample and keep the representativeness and generability, the 5,004 selected children were drawn from the total samples of 8,829 to keep the original distribution according age groups (6 to 11 months, 12 to 23 months, and 24 to 59 months), and some health conditions related to iron metabolism, e.g., anemia and nutrient deficiencies. Then, they were randomly selected to constitute the final sample that aimed to represent the total number of children with blood drawn. Hence, our efforts were to preserve the original characteristics of the sample and the representativeness of the original sample.

      The ENANI-2019 study does not appear to present a bias towards higher socioeconomic status. Evidence from two major Brazilian population-based household surveys supports this claim. The 2017-18 Household Budget Survey (POF) reported an average monthly household income of 5,426.70 reais, while the Continuous National Household Sample Survey (PNAD) reported that in 2019, the nominal monthly per capita household income was 1,438.67 reais. In comparison, ENANI-2019 recorded a household income of 2,144.16 reais and a per capita income of 609.07 reais in infants with blood drawn, and 2,099.14 reais and 594.74 reais, respectively, in the serum metabolome analysis sample.

      In terms of maternal education, the 2019 PNAD-Education survey indicated that 48.8% of individuals aged 25 or older had at least 11 years of schooling. When analyzing ENANI-2019 under the same metric, we found that 56.26% of ≥25 years-old mothers of infants with blood drawn had 11 years of education or more, and 51.66% in the metabolome analysis sample. Although these figures are slightly higher, they remain within a reasonable range for population studies.

      It is well known that higher income and maternal education levels can influence child health outcomes, and acknowledging this, ENANI-2019 employed rigorous sampling methods to minimize selection biases. This included stratified and complex sampling designs to ensure that underrepresented groups were adequately included, reducing the risk of skewed conclusions. Therefore, the evidence strongly suggests that the ENANI-2019 sample is broadly representative of the Brazilian population in terms of both socioeconomic status and educational attainment.

      Further to this, could the Authors describe how inequalities in access to care in the Brazilian population may have affected their results? Could they have included a measure of this possible discrepancy in their analyses?

      Thank you for the concern.

      The truth is that we are not in a position to answer this question because our study focused on gathering data on infant nutritional status and there is very limited information on access to care to allow us to hypothesize. Another important piece of information is that this national survey used sampling procedures that aimed to make the sample representative of the 15 million Brazilian infants under 5 years. Therefore, the sample is balanced according to socio-economic strata, so there is no evidence to make us believe inequalities in access to health care would have played a role.

      The Authors state that the results of their study may be used to track children at risk for developmental delays. Could they discuss the potential for influencing policies and guidelines to address delayed development due to malnutrition and/or limited access to certain essential foods?

      The point raised by the reviewer is very relevant. Recognizing that dietary and microbial derived metabolites involved in the gut-brain axis could be related to children's risk of developmental delays is the first step to bringing this topic to the public policy agenda. We believe the results can contribute to the literature, which should be used to accumulate evidence to overcome knowledge gaps and support the formulation and redirection of public policies aimed at full child growth and development; the promotion of adequate and healthy nutrition and food security; the encouragement, support, and protection of breastfeeding; and the prevention and control of micronutrient deficiencies.  

      Reviewer #3 (Public Review):

      The ENANI-2019 study provides valuable insights into child nutrition, development, and metabolomics in Brazil, highlighting both challenges and opportunities for improving child health outcomes through targeted interventions and further research.

      Readers might consider the following questions:

      (1) Should investigators study the families through direct observation of diet and other factors to look for a connection between food taken in and gut microbiome and child development?

      As mentioned before, the ENANI-2019 did not collect data on stool derived microbiome. However, there is data on child dietary intake with 24-hour recall that can be further explored in other studies.

      (2) Can an examination of the mother's gut microbiome influence the child's microbiome? Can the mother or caregiver's microbiome influence early childhood development?

      The questions raised by the reviewer are interesting and has been explored by other authors. However, we do not have microbiota data from the child nor from the mother/caregiver.

      (3) Is developmental quotient enough to study early childhood development? Is it comprehensive enough?

      Yes, we are confident it is comprehensive enough.

      According to the World Health Organization, the term Early Childhood Development (ECD) refers to the cognitive, physical, language, motor, social and emotional development between 0 - 8 years of age. The SWCY milestones assess the domains of cognition, language/communication and motor. Therefore, it has enough content validity to represent ECD.

      The SWYC is recommended for screening ECD by the American Society of Pediatrics. Furthermore, we assessed the internal consistency of the SWYC milestones questionnaire using ENANI-2019 data and Cronbach's alpha. The findings indicated satisfactory reliability (0.965; 95% CI: 0.963–0.968).

      The SWCY is a screening instrument and indicates if the ECD is not within the expected range. If one of the above-mentioned domains are not achieved as expected the child may be at risk of ECD delay. Therefore, DQ<1 indicates that a child has not reached the expected ECD for the age group. We cannot say that children with DQ≥1 have full ECD, since we do not assess the socio-emotional domains. However, DQ can track the risk of ECD delay.

      References

      Blekherman, G., Laubenbacher, R., Cortes, D. F., Mendes, P., Torti, F. M., Akman, S., ... & Shulaev, V. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329-343.

      Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Analytica chimica acta, 879, 10-23.

      Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109-130.

      LUIZ, RR., and STRUCHINER, CJ. Inferência causal em epidemiologia: o modelo de respostas potenciais [online]. Rio de Janeiro: Editora FIOCRUZ, 2002. 112 p. ISBN 85-7541-010-5. Available from SciELO Books http://books.scielo.org.

      GREENLAND, S. & ROBINS, J. M. Identifiability, exchangeability, and epidemiological Confounding. International Journal of Epidemiolgy, 15(3):413-419, 1986.

      Freitas-Costa NC, Andrade PG, Normando P, et al. Association of development quotient with nutritional status of vitamins B6, B12, and folate in 6–59-month-old children: Results from the Brazilian National Survey on Child Nutrition (ENANI-2019). The American journal of clinical nutrition 2023;118(1):162-73. doi: https://doi.org/10.1016/j.ajcnut.2023.04.026

      Sheldrick RC, Schlichting LE, Berger B, et al. Establishing New Norms for Developmental Milestones. Pediatrics 2019;144(6) doi: 10.1542/peds.2019-0374 [published Online First: 2019/11/16]

      Drachler Mde L, Marshall T, de Carvalho Leite JC. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test. Paediatric and perinatal epidemiology 2007;21(2):138-53. doi: 10.1111/j.1365-3016.2007.00787.x [published Online First: 2007/02/17]

      VanderWeele, TJ Princípios de seleção de fatores de confusão. Eur J Epidemiol 34, 211–219 (2019). https://doi.org/10.1007/s10654-019-00494-6

      David G. Kleinbaum, Lawrence L. Kupper; Hal Morgenstern. Epidemiologic Research: Principles and Quantitative Methods. 1991

      Yan R, Liu X, Xue R, Duan X, Li L, He X, Cui F, Zhao J. Association between internet exclusion and depressive symptoms among older adults: panel data analysis of five longitudinal cohort studies. EClinicalMedicine 2024;75. doi: 10.1016/j.eclinm.2024.102767.

      Zhong Y, Lu H, Jiang Y, Rong M, Zhang X, Liabsuetrakul T. Effect of homemade peanut oil consumption during pregnancy on low birth weight and preterm birth outcomes: a cohort study in Southwestern China. Glob Health Action. 2024 Dec 31;17(1):2336312.

      Aristizábal LYG, Rocha PRH, Confortin SC, et al. Association between neonatal near miss and infant development: the Ribeirão Preto and São Luís birth cohorts (BRISA). BMC Pediatr. 2023;23(1):125. Published 2023 Mar 18. doi:10.1186/s12887-023-03897-3

      Al-Haddad BJS, Jacobsson B, Chabra S, et al. Long-term risk of neuropsychiatric disease after exposure to infection in utero. JAMA Psychiatry. 2019;76(6):594-602. doi:10.1001/jamapsychiatry.2019.0029

      Chan, A.Y.L., Gao, L., Hsieh, M.HC. et al. Maternal diabetes and risk of attention-deficit/hyperactivity disorder in offspring in a multinational cohort of 3.6 million mother–child pairs. Nat Med 30, 1416–1423 (2024).

      Hernan MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

      Greenland S; Pearl J; Robins JM. Confounding and collapsibility in causal inference. Statist Sci. 14 (1) 29 - 46 1999. https://doi.org/10.1214/ss/1009211805

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary: 

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact outcompeted (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results. 

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hsFLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone). 

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UASMyc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but we nevertheless report it in a way that captures the phenomenon in the revised manuscript. 

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N). 

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual. 

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here. 

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division. 

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We updated all panels replacing Cas3 by Dcp-1. 

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones. 

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development? 

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We include this information and the relevant reference (Brown et al, 2014) in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary: 

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition. 

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We provide those results in the updated manuscript (Figure 1 Suppl 2 C-D).

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we considered worthwhile to examine. We performed immunostaining for Fmi in clones to determine whether its levels change during competition. Fmi is expressed ubiquitously at apical plasma membranes throughout the disc, and this was unchanged by competition, including inside >>Myc clones and at the clone boundary, where competition is actively happening. We provide these results as a new supplementary figure (Figure 5 Suppl 1) in the updated manuscript.

      Reviewer 3:

      Summary: 

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.

      The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.

      Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,

      (1) Why did the authors choose Marcks and Marcksl1? The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      To highlight the rationale for selecting these proteins, we reworded the sentence as follows: “A recent proteomic screen … after SCI identified a number of proteins that are highly upregulated at the tadpole stage but downregulated in froglets (Kshirsagar, 2020). These proteins included Marcks and Marcksl1, which had previously been implicated in the regeneration of other tissues (El Amri et al., 2018) suggesting a potential role for these proteins also in spinal cord regeneration.”

      (2) Gene knock-out experiments with F0 crispants,

      The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      To address this issue, we state explicitly in line 208-212: “We also confirmed by immunohistochemistry that co-injection of marcks.L/S and marcksl1.L/S sgRNA, which is predicted to edit all four homeologs (henceforth denoted as 4M CRISPR) drastically reduced immunostaining for Marcks and Marcksl1 protein on the injected side (Fig. S6 B-G), indicating that protein levels are reduced in gene-edited embryos.”

      (3) Limitations of pharmacological compound rescue

      In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

      As explained in the responses to reviewer 1, we have completely rewritten and toned down our presentation of the pharmacological result and explicitly mention in our discussion now the possibility of side effects.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroencephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. Generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the sample size being smaller than planned due to the pandemic restrictions is a weakness for this study, and hope that future studies into cholinergic effects on motivation in humans will use larger sample sizes. They should also ensure women are not excluded from sample populations, which will become even more important if the research progresses to clinical populations.

      Reviewer #3 (Public review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within subject pharmacological design and a task well designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to covid). Nonetheless, it is worth stating explicitly that this sample size is relatively small for the effect sizes typically observed in such studies highlighting the need for future confirmatory studies.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the small sample size is a weakness of the study, and hope that future work into cholinergic modulation of motivation can involve larger samples to replicate and extend this work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments and clarifying the analysis sections. Women can be included in such studies by performing a pregnancy test before each test session, but I understand how this could have added to the pandemic limitations. Best of luck with your future work!

      Thank you for your time in reviewing this paper, and your helpful comments.

      Reviewer #3 (Recommendations for the authors):

      The authors have done a great job at addressing my concerns and I think that the manuscript is now very solid. That said, I have one minor concern.

      Thank you for your time in reviewing this paper, and your helpful comments.

      For descriptions of mass univariate analyses and cluster correction, I am still a bit confused on exactly what terms were in the regression. In one place, the authors state:

      On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model 'variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)'.

      I take this to mean that the regression model includes a voltage regressor and a three-way interaction term, along with participant level intercept terms.

      However, elsewhere, the authors state:

      "We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant."

      I take this to mean that the regression model included regressors for incentive, distractorPresent, THP, along with their 2 and 3 way interactions. I think that this seems like the more reasonable model - but I just want to 1) verify that this is what the authors did and 2) encourage them to articulate this more clearly and consistently throughout.

      We apologise for the lack of clarity about the whole-brain regression analyses.

      We used Wilkinson notation for this formula, where ‘A*B’ denotes ‘A + B + A:B’, so all main effects and lower-order interactions terms were included in the regression, as your second interpretation says. The model written out in full would be:

      'variable ~1 + voltage + incentive + distractorPresent + THP + incentive*distractorPresent + incentive*THP + distractorPresent*THP +  incentive*distractorPresent*THP + (1 | participant)'    

      We will clarify this in the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors used a motivated saccade task with distractors to measure response vigor and reaction time (RT) in healthy human males under placebo or muscarinic antagonism. They also simultaneously recorded neural activity using EEG with event-related potential (ERP) focused analyses. This study provides evidence that the muscarinic antagonist Trihexyphenidyl (THP) modulates the motivational effects of reward on both saccade velocity and RT, and also increases the distractibility of participants. The study also examined the correlational relationships between reaction time and vigor and manipulations (THP, incentives) with components of the EEG-derived ERPs. While an interesting correlation structure emerged from the analyses relating the ERP biomarkers to behavior, it is unclear how these potentially epiphenomenal biomarkers relate to relevant underlying neurophysiology.

      Strengths:

      This study is a logical translational extension from preclinical findings of cholinergic modulation of motivation and vigor and the CNV biomarker to a normative human population, utilizing a placebo-controlled, double-blind approach.

      While framed in the context of Parkinson's disease where cholinergic medications can be used, the authors do a good job in the discussion describing the limitations in generalizing their findings obtained in a normative and non-age-matched cohort to an aged PD patient population.

      The exploratory analyses suggest alternative brain targets and/or ERP components that relate to the behavior and manipulations tested. These will need to be further validated in an adequately powered study. Once validated, the most relevant biomarkers could be assessed in a more clinically relevant population.

      Weaknesses:

      The relatively weak correlations between the main experimental outcomes provide unclear insight into the neural mechanisms by which the manipulations lead to behavioral manifestations outside the context of the ERP. It would have been interesting to evaluate how other quantifications of the EEG signal through time-frequency analyses relate to the behavioral outcomes and manipulations.

      The ERP correlations to relevant behavioral outcomes were not consistent across manipulations demonstrating they are not reliable biomarkers to behavior but do suggest that multiple underlying mechanisms can give rise to the same changes in the ERP-based biomarkers and lead to different behavioral outcomes.

      We thank the reviewer for their review and their comments.

      We agree that these ERPs may not be reliable biomarkers yet, given the many-to-one mapping we observed where incentives and THP antagonism both affected the CNV in different ways, and hope that future studies will help clarify the use and limitations of the CNV as a potential biomarker of invigoration.

      Our original hypothesis was specifically about the CNV as an index of preparatory behaviour, but we plan to look at potential changes to frequency characteristics in future work. We have included this in the discussion of future investigations. (page 16, line 428):

      “Future investigations of other aspects of the EEG signals may illuminate us. Such studies could also investigate other potential signals that may be more sensitive to invigoration and/or muscarinic antagonism, including frequency-band power and phase-coherence, or measures of variability in brain signals such as entropy, which may give greater insight into processes affected by these factors.”

      Reviewer #2 (Public Review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroengephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. The generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their review, and their comments.

      We agree that our study was underpowered, not reaching our target of 27 participants due to pandemic restrictions halting our recruitment, and hope that future studies into muscarinic antagonism in motivation will have larger sample sizes, and include male and female participants across a range of ages, to assess generalisability.

      We only included men to prevent the chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we reference to this in the Methods/Participants section (page 18, line 501):

      “We recruited 27 male participants (see Drugs section above),…”

      We agree that future work is needed to replicate this in different samples, and that this work cannot tell us the mechanism by which the drug is dampening invigoration, but we think that showing these effects do occur and can be linked to anticipatory/preparatory activity rather than overall reward sensitivity is a useful finding.

      Reviewer #3 (Public Review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within-subject pharmacological design and a task well-designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      In full disclosure, I have previously reviewed this manuscript in another journal and the authors have done a considerable amount of work to address my previous concerns. However, I have a few remaining concerns that affect my interpretation of the current manuscript.

      Some of the EEG signals (figures 4A&C) have profiles that look like they could have ocular, rather than central nervous, origins. Given that this is an eye movement task, it would be useful if the authors could provide some evidence that these signals are truly related to brain activity and not driven by ocular muscles, either in response to explicit motor effects (ie. Blinks) or in preparation for an upcoming saccade.

      We thank the reviewer for re-reviewing the manuscript and for raising this issue.

      All the EEG analyses (both ERP and whole-brain) are analysing the preparation period between the ready-cue and target appearance when no eye-movements are required. We reject trials with blinks or saccades over 1 degree in size, as detected by the Eyelink software according the sensitive velocity and acceleration criteria specified in the manuscript (Methods/Eye-tracking, page 19, line 550). This means that there should be no overt eye movements in the data. However, microsaccades and ocular drift are still possible within this period, which indeed could drive some effects. To measure this, we counted the number of microsaccades (<1 degree in size) in the preparation period between incentive cue and the target onset, for each trial. Further, we measure the mean absolute speed of the eye during the preparation period (excluding the periods during microsaccades) for each trial.

      We have run a control analysis to check whether including ocular drift speed or number of microsaccades as a covariate in the whole-brain regression analysis changes the association between EEG and the behavioural metrics at frontal or other electrodes. Below we show these ‘variable ~ EEG’ beta-coefficients when controlling for each eye-movement covariate, in the same format as Figure 4. We did not run the permutation testing on this due to time/computational costs (it takes >1 week per variable), so p-values were not calculated, only the beta-coefficients. The beta-coefficients are almost unchanged, both in time-course and topography, when controlling for either covariate.  The frontal associations to velocity and distractor pull remain, suggesting they are not due to these eye movements.

      We have added this figure as a supplemental figure.

      For additional clarity in this response, we also plot the differences between these covariate-controlled beta-coefficients, and the true beta-coefficients from figure 4 (please note the y-axis scales are -0.02:0.02, not -0.15:0.15 as in Figure 4 and Figure 4-figure supplement 2). This shows that the changes to the associations between EEG and velocity/distractor-pull were not frontally-distributed, demonstrating eye-movements were not driving these effects. Relatedly, the RT effect’s change was frontally-distributed, despite Figure 4 showing the true relationship was central in focus, again indicating that effect was also not related to these eye movements.

      Author response image 1.

      Difference in beta-coefficients when eye-movement covariates are included. This is the difference from the beta-coefficients shown in Figure 4, please note the smaller y-axis limits.

      The same pattern was seen if we controlled for the change in eye-position from the baseline period (measured by the eye-tracker) at each specific time-point, i.e., controlling for the distance the eye had moved from baseline at the time the EEG voltage is measured. The topographies and time-course plots were almost identical to the above ones:

      Author response image 2.

      Controlling for change in eye-position at each time-point does not change the regression results. Left column shows the beta-coefficients between the variable and EEG voltage, and the right column shows the difference from the main results in Figure 4 (note the smaller y-axis limits for the right-hand column).

      Therefore, we believe the brain-behaviour regressions are independent of eye-movements. We have included the first figure presented here as an additional supplemental figure, and added the following to the text (page 10, line 265):

      “An additional control analysis found that these results were not driven by microsaccades or ocular drift during the preparation period, as including these as trial-wise covariates did not substantially change the beta-coefficients (Figure 4 – Figure Supplement 2).”

      For other EEG signals, in particular, the ones reported in Figure 3, it would be nice to see what the spatial profiles actually look like - does the scalp topography match that expected for the signal of interest?

      Yes, the CNV is a central negative potential peaking around Cz, while the P3a is slightly anterior to this (peaking between Cz and FCz). We have added the topographies to the main figure (see point below).

      This is the topography of the mean CNV (1200:1500ms from the preparation cue onset), which is maximal over Cz, as expected.

      The P3a’s topography (200:280ms after preparation cue) is maximal slightly anterior to Cz, between Cz and FCz.

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to COVID). That said, they only report the sample size in one place in the methods rather than through degrees of freedom in their statistical tests conducted throughout the results. In part because of this, I am not totally clear on whether the sample size for each analysis is the same - or whether participants were removed for specific analyses (ie. due to poor EEG recordings, for example).  

      We apologise for the lack of clarity here. All 20 participants were included in all analyses, although the number of trials included differed between behavioural and EEG analyses. We only excluded trials with EEG artefacts from the EEG analyses, not from the purely behavioural analyses such as Figures 1&2, although trials with blinks/saccades were removed from behavioural analyses too. Removing the EEG artefactual trials from the behavioural analyses did not change the findings, despite the lower power. The degrees of freedom in the figure supplement tables are the total number of trials (less 8 fixed-effect terms) included in the single-trial / trial-wise regression analyses we used.

      We have clarified this in the Methods/Analysis (page 20, line 602):

      “Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.”

      And we state the number of participants and trials in the start of the behavioural results (page 3, line 97):

      “We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT.”

      and EEG results section (page 7, line 193):

      “We used single-trial linear mixed-effects regression to see the effects of Incentive and THP on each ERP (20 participants, 16627 trials; Distractor was included too, along with all interactions, and a random intercept by participant).”

      Beyond this point, but still related to the sample size, in some cases I worry that results are driven by a single subject. In particular, the interaction effect observed in Figure 1e seems like it would be highly sensitive to the single subject who shows a reverse incentive effect in the drug condition.

      Repeating that analysis after removing the participant with the large increase in saccadic RT with incentives did not remove the incentive*THP interaction effect – although it did weaken slightly from (β = 0.0218, p = .0002) to  (β=0.0197, p=.0082). This is likely because that while that participant did have slower RTs for higher incentives on THP, they were also slower for higher incentives under placebo (and similarly for distractor present/absent), making them less of an outlier in terms of effects than in raw RT terms. Below is Author response image 3 the mean-figure without that participant, and Author response image 4 that participant shown separately.

      Author response image 3.

      Author response image 4.

      There are not sufficient details on the cluster-based permutation testing to understand what the authors did or whether it is reasonable. What channels were included? What metric was computed per cluster? How was null distribution generated?

      We apologise for not giving sufficient details of this, and have updated the Methods/Analysis section to include these details, along with a brief description in the Results section.

      To clarify here, we adapted the DMGroppe Mass Univariate Testing toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour – i.e. does adding the voltage at this time/channel explain additional variance in the variable not captured in our main behavioural analyses. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution of cluster mass (across times/channels per iteration), and calculated the p-value as the proportion of this distribution further from zero than the absolute true t-statistics (two-tailed test).

      We have given greater detail for this in the Methods/Analysis section (page 20, line 614):

      “We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.”

      And we have added a brief explanation to the Results section also (page 9, line 246):

      “We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant. This analysis therefore asks whether trial-to-trial neural variability predicts behavioural variability. To assess significance, we used cluster-based permutation tests (DMGroppe Mass Univariate toolbox; Groppe, Urbach, & Kutas, 2011), shuffling the trials within each condition and person, and repeating it 2500 times, to build a null distribution of ‘cluster mass’ from the t-statistics (Bullmore et al., 1999; Maris & Oostenveld, 2007) which was used to calculate two-tailed p-values with a family-wise error rate (FWER) of .05 (see Methods/Analysis for details).”

      The authors report that "muscarinic antagonism strengthened the P3a" - but I was unable to see this in the data plots. Perhaps it is because the variability related to individual differences obscures the conditional differences in the plots. In this case, event-related difference signals could be helpful to clarify the results.

      We thank the reviewer for spotting this wording error, this should refer to the incentive effect weakening the P3a, as no other significant effects were found on the P3a, as stated correctly in the previous paragraph. We have corrected this in the manuscript (page 9, line 232):

      “This suggests that while incentives strengthened the incentive-cue response and the CNV and weakened the P3a, muscarinic antagonism strengthened the CNV,”

      The reviewer’s suggestion for difference plots is very valuable, and we have added these to Figure 3, as well as increasing the y-axis scale for figure 3c to show the incentives weakening the P3a more clearly, and adding the topographies suggested in an earlier comment. The difference waves for Incentive and THP effects show that both are decreasing voltage, albeit with slightly different onset times – Incentive starts earlier, thus weakening the positive P3a, while both strengthen the negative CNV. The Incentive effects within THP and Placebo separately illustrate the THP*Incentive interaction.

      We have amended the Results text and figure (page 7, line 200):

      “The subsequent CNV was strengthened (i.e. more negative; Figure 3d) by incentive (β = -.0928, p < .0001) and THP (β = -0.0502, p < .0001), with an interaction whereby THP decreased the incentive effect (β= 0.0172, p = .0213). Figure 3h shows the effects of Incentive and THP on the CNV separately, using difference waves, and Figure 3i shows the incentive effect grows more slowly in the THP condition than the Placebo condition.

      For mediation analyses, it would be useful in the results section to have a much more detailed description of the regression results, rather than just reporting things in a binary did/did not mediate sort of way. Furthermore, the methods should also describe how mediation was tested statistically (ie. What is the null distribution that the difference in coefficients with/without moderator is tested against?).

      We have added a more detailed explanation of how we investigated mediation and mediated moderation, and now report the mediation effects for all tests run and the permutation-test p-values.

      We had been using the Baron & Kenny (1986) method, based on 4 tests outlined in the updated text below, which gives a single measure of change in absolute beta-coefficients when all the tests have been met, but without any indication of significance; any reduction found after meeting the other 3 tests indicates a partial mediation under this method. We now use permutation testing to generate a p-value for the likelihood of finding an equal or larger reduction in the absolute beta-coefficients if the CNV were not truly related to RT. This found that the CNV’s mediation of the Incentive effect on RT was highly significant, while the Mediated Moderation of CNV on THP*Incentive was weakly significant.

      During this re-analysis, we noticed that we had different trial-numbers in the different regression models, as EEG-artefactual trials were not excluded from the behavioural-only model (‘RT ~ 1 + Incentive’). However, this causes issues with the permutation testing as we are shuffling the ERPs and need the same trials included in all the mixed-effects models. Therefore, we have redone these mediation analyses, including only the trials with valid ERP measures (i.e. no artefactual trials) in all models. This has changed the beta-coefficients we report, but not the findings or conclusions of the mediation analyses. We have updated the figure to have these new statistics.

      We have updated the text to explain the methodology in the Results section (page 12, line 284):

      “We have found that neural preparatory activity can predict residual velocity and RT, and is also affected by incentives and THP. Finally, we ask whether the neural activity can explain the effects of incentives and THP, through mediation analyses. We used the Baron & Kenny ( 1986) method to assess mediation (see Methods/Analysis for full details). This tests whether the significant Incentive effect on behaviour could be partially reduced (i.e., explained) by including the CNV as a mediator in a mixed-effects single-trial regression. We measured mediation as the reduction in (absolute) beta-coefficient for the incentive effect on behaviour when the CNV was included as a mediator (i.e., RT ~ 1 + Incentive + CNV + Incentive*CNV + (1 | participant)). This is a directional hypothesis of a reduced effect, and to assess significance we ran a permutation-test, shuffling the CNV within participants, and measuring the change in absolute beta-coefficient for the Incentive effect on behaviour. This generates a distribution of mediation effects where there is no relationship between CNV and RT on a trial (i.e., a null distribution). We ran 2500 permutations, and calculated the proportion with an equal or more negative change in absolute beta-coefficient, equivalent to a one-tailed test. We ran this mediation analysis separately for the two behavioural variables of RT and residual velocity, but not for distractor pull as it was not affected by incentive, so failed the assumptions of mediation analyses (Baron & Kenny, 1986; Muller et al., 2005). We took the mean CNV amplitude from 1200:1500ms as our Mediator.

      Residual velocity passed all the assumption tests for Mediation analysis, but no significant mediation was found. That is, Incentive predicted velocity (β=0.1304, t(1,16476)=17.3280, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted velocity when included alongside Incentive (β=0.0015, t(1,16475)=1.9753, p=.0483). However, including CNV did not reduce the Incentive effect on velocity, and in fact strengthened it (β=0.1318, t(1,16475)=17.4380, p<.0001; change in absolute coefficient: Δβ=+0.0014). Since there was no mediation (reduction), we did not run permutation tests on this.

      However, RT did show a significant mediation of the Incentive effect by CNV: Incentive predicted RT (β=-0.0868, t(1,16476)=-14.9330, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted RT when included alongside Incentive (β=0.0127, t(1,16475)=21.3160, p<.0001). The CNV mediated the effect of Incentive on RT, reducing the absolute beta-coefficient (β=-0.0752, t(1,16475)=-13.0570, p<.0001; change in absolute coefficient: Δβ= -0.0116). We assessed the significance of this change via permutation testing, shuffling the CNV across trials (within participants) and calculating the change in absolute beta-coefficient for the Incentive effect on RT when the permuted CNV was included as a mediator. We repeated this 2500 times to build a null distribution of Δβ, and calculated the proportion with equal or stronger reductions for a one-tailed p-value, which was highly significant (p<.0001). This suggests that the Incentive effect on RT is partially mediated by the CNV’s amplitude during the preparation period, and this is not the case for residual velocity.

      We also investigated whether the CNV could explain the cholinergic reduction in motivation (THP*Incentive interaction) on RT – i.e., whether CNV mediation the THP moderation. We measured Mediated Moderation as suggested by Muller et al. (2005; see Methods/Analysis for full explanation): Incentive*THP was associated with RT (β=0.0222, t(1,16474)=3.8272, p=.0001); and Incentive*THP was associated with CNV (β=0.1619, t(1,16474)=2.1671, p=.0302); and CNV*THP was associated with RT (β=0.0014, t(1,16472)=2.4061, p=.0161). Mediated Moderation was measured by the change in absolute Incentive*THP effect when THP*CNV was included in the mixed-effects model (β=0.0214, t(1,16472)=3.7298, p=.0002; change in beta-coefficient: Δβ= -0.0008), and permutation-testing (permuting the CNV as above) found a significant effect (p=.0132). This indicates cholinergic blockade changes how incentives affect preparatory negativity, and how this negativity reflects RT, which can explain some of the reduced invigoration of RT. However, this was not observed for saccade velocity.

      And we have updated the Methods/Analysis section with a more detailed explanation too (page 21, line 627):

      “For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or smaller than the true values (as Mediation is a one-tailed prediction).

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or smaller than the true change.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) The analysis section could benefit from greater detail. For example, how exactly did they assess that the effects of the drug on peak velocity and RT were driven by non-distracting trials? Ideally, for every outcome, the analysis approach used should be detailed and justified.

      We apologise for the confusion from this. To clarify, we found a 2-way regression (incentive*THP) on both residual velocity and saccadic RT and this pattern was stronger in distractor-absent trials for residual velocity, and stronger in distractor-present trials for saccadic RT, as can be seen in Figure 1d&e. However, as there was no significant 3-way interaction (incentive*THP*distractor) for either metric, and the 2-way interaction effects were in the same direction in distractor present/absent trials for both metrics, we think these effects were relatively unaffected by distractor presence.

      We have updated the Results section to make this clearer: (page 3, line 94):

      We measured vigour as the residual peak velocity of saccades within each drug session (see Figure 1c & Methods/Eye-tracking), which is each trial’s deviation of velocity from the main sequence. This removes any overall effects of the drug on saccade velocity, while still allowing incentives and distractors to have different effects within each drug condition. We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT. As predicted, residual peak velocity was increased by incentives (Figure 1d; β = 0.1266, p < .0001), while distractors slightly slowed residual velocity (β = -0.0158, p = .0294; see Figure 1 – Figure supplement 1 for full behavioural statistics). THP decreased the effect of incentives on velocity (incentive * THP: β = -0.0216, p = .0030), indicating that muscarinic blockade diminished motivation by incentives. Figure 1d shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was absent; the 3-way (distractor*incentive*THP) interaction was not significant (p > .05), suggesting that the distractor-present trials had the same effect but weaker (Figure 1d).

      Saccadic RT (time to initiation of saccade) was slower when participants were given THP (β = 0.0244, p = < .0001), faster with incentives (Figure 1e; β = -0.0767, p < .0001), and slowed by distractors (β = 0.0358, p < .0001). Again, THP reduced the effects of incentives (incentive*THP: β = 0.0218, p = .0002). Figure 1e shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was present; as the 3-way (distractor*incentive*THP) interaction was not significant and the direction of effects was the same in the two, it suggests the effect was similar in both conditions. Additionally, the THP*Incentive interactions were correlated between saccadic RT and residual velocity at the participant level (Figure 1 – Figure supplement 2).

      We have given more details of the analyses performed in the Methods section and the results, as requested by you and the other reviewers (page 20, line 602):

      Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.

      We used single-trial linear-mixed effects models to analyse our data, including participant as a random effect of intercept, with the formula ‘~1 + incentive*distractor*THP + (1 | participant)’. We z-scored all factors to give standardised beta coefficients.

      For the difference-wave cluster-based permutation tests (Figure 3 – Figure supplement 4), we used the DMGroppe Mass Univariate toolbox (Groppe et al., 2011), with 2500 permutations, to control the family-wise error rate at 0.05. This was used for looking at difference waves to test the effects of incentive, THP, and the incentive*THP interaction (using difference of difference-waves), across all EEG electrodes.

      We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.

      For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or more negative than the true value (as Mediation is a one-tailed prediction). For this mediation analysis, we only included trials with valid ERP measures, even for the models without the ERP included (e.g., model #1), to keep the trial-numbers and degrees of freedom the same.

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or more negative than the true change.

      (2) Please explain why only men were included in this study. We are all hoping that men-only research is a practice of the past.

      We only included men to prevent any chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we have referenced this in the Methods/Participants section (page 18, line 501):

      “Our sample size calculations suggested 27 participants would detect a 0.5 effect size with .05 sensitivity and .8 power. We recruited 27 male participants (see Drugs section above)”

      (3) Please explain acronyms (eg EEG) when first used.

      Thank you for pointing this out, we have explained EEG at first use in the abstract and the main text, along with FWER, M1r, and ERP which had also been missed at first use.

      Reviewer #3 (Recommendations For The Authors):

      The authors say: "Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and increased the pull of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity." But I found this statement to be misleading since the primary effects of the drug seem to have been to decrease the frequency of distractor-repulsed saccades... so "decreased push" would probably be a better analogy than "increased pull".

      Thank you for noticing this, we agree, and have changed this to (page 5, line 165):

      “Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and decreased the repulsion of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity.”

      I don't see anything in EEG preprocessing about channel rejection and interpolation. Were these steps performed? There are very few results related to the full set of electrodes.

      We did not reject or interpolate any channels, as visual inspection found no obvious outliers in terms of noisiness, and no channels had standard deviations (across time/trials) higher than our standard cutoff (of 80). The artefact rejection was applied across all EEG channels, so any trials with absolute voltages over 200uV in any channel were removed from the analysis. On average 104/120 trials were included (having passed this check, along with eye-movement artefact checks) per condition per person, and we have added the range of these, along with totals across conditions to the Analysis section and a statement about channel rejection/interpolation (page 20, line 588):

      “Epochs were from -200:1500ms around the preparation cue onset, and were baselined to the 100ms before the preparation cue appeared. Visual inspection found no channels with outlying variance, so no channel rejection or interpolation was performed. We rejected trials from the EEG analyses where participants blinked or made saccades (according to EyeLink criteria above) during the epoch, or where EEG voltage in any channel was outside -200:200μV (muscle activity). On average 104/120 trials per condition per person were included (SD = 21, range = 21-120), and 831/960 trials in total per person (SD=160, range=313-954). A repeated-measures ANOVA found there were no significant differences in number of trials excluded for any condition (p > .2).”

    1. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author response:

      We agree with reviewer #1 to remove the mGluR6b data. It is indeed a weakness and is too preliminary. We will gladly remove it from the revised version.

      We will address the issue of the bulk responses (depicted in Figures 5 and 6) by showing the significance data, arguing that although we cannot prove that prey-detection is increased for lower intensities, the bulk effect is significant, so prey detection is effectively stronger.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

      While we have not used the term "direction aware", we think the reviewer refers generally to the capability of our model to use a cell's direction selectivity in the decoding. In accordance with the reviewer's interpretation, we did indeed mean that the decoder assumes that a neuron does not only have a preferred frequency, but also a preferred direction of change in frequency (ascending/descending), which is what we use to demonstrate that the decoding in this way aligns with the human percept. We have adapted the text in several places to clarify this, in particular expanding the description in the Methods substantially.

      Reviewer #2 (Public Review):

      The authors aim to better understand the neural responses to Shepard tones in auditory cortex. This is an interesting question as Shepard tones can evoke an ambiguous pitch that is manipulated by a proceeding adapting stimulus, therefore it nicely disentangles pitch perception from simple stimulus acoustics.

      The authors use a combination of computational modelling, ferret A1 recordings of single neurons, and human EEG measurements.

      Their results provide new insights into neural correlates of these stimuli. However, the manuscript submitted is poorly organized, to the point where it is near impossible to review. We have provided Major Concerns below. We will only be able to understand and critique the manuscript fully after these issues have been addressed to improve the readability of the manuscript. Therefore, we have not yet reviewed the Discussion section.

      Major concerns

      Organization/presentation

      The manuscript is disorganized and therefore difficult to follow. The biggest issue is that in many figures, the figure subpanels often do not correspond to the legend, the main body, or both. Subpanels described in the text are missing in several cases.

      We have gone linearly through the text and checked that all figure subpanels are referred to in the text and the legend. As far as we can tell, this was already the case for all panels, with the exception of two subpanels of Fig. 5.

      Many figure axes are unlabelled.

      We have carefully checked the axes of all panels and all but two (Fig. 5D) were labeled. As is customary, certain panels inherit the axis label from a neighboring panel, if the label is the same, e.g. subpanels in Fig. 6F or Fig. 5E, which helps to declutter the figure. We hope that with this clarification, the reviewer can understand the labels of each panel.

      There is an inconsistent style of in-text citation between figures and the main text. The manuscript contains typos and grammatical errors. My suggestions for edits below therefore should not be taken as an exhaustive list. I ask the authors to consider the following only a "first pass" review, and I will hopefully be able to think more deeply about the science in the second round of revisions after the manuscript is better organized.

      While we are puzzled by the severity of issues that R2 indicates (see above, and R3 qualifies it as "well written", and R1 does not comment on the writing negatively), we have carefully gone through all specific issues mentioned by R2 and the other reviewers. We hope that the revised version of the paper with all corrections and clarifications made will resolve any remaining issues.

      Frequency and pitch

      The terms "frequency" and "pitch" seem to be used interchangeably at times, which can lead to major misconceptions in a manuscript on Shepard tones. It is possible that the authors confuse these concepts themselves at times (e.g. Fig 5), although this would be surprising given their expertise in this field. Please check through every use of "frequency" and "pitch" in this manuscript and make sure you are using the right term in the right place. In many places, "frequency" should actually be "fundamental frequency" to avoid misunderstanding.

      Thanks for pointing this out. We have checked every occurrence and modified where necessary.

      Insufficient detail or lack of clarity in descriptions

      There seems to be insufficient information provided to evaluate parts of these analysis, most critically the final pitch-direction decoder (Fig 6), which is a major finding. Please clarify.

      Thanks for pointing this out. We have extended the description of the pitch-direction decoder and highlighted its role for interpreting the results.

      Reviewer #3 (Public Review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      Following this excellent suggestion, we have added more brief method sketches to the Results section, hopefully addressing this concern.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Thanks for your encouraging evaluation of the work.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

      Thanks for your encouraging evaluation of our work. The suggestion of an interventional study has also been on our minds, however, this appears rather difficult, as it would require a specific subset of cells to be inhibited. The most suitable approach would likely be 2p imaging with holographic inhibition of a subset of cells (using ArchT for example), that has a preference for one direction of pitch change, which should then bias the percept/behavior in the opposite direction.

      Reviewer #1 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) What is the timescale used to compute direction selectivity in neural tuning? How does it compare to the timing of the Shepard tones? The basic idea of up versus down pitch is clear, the intuition for the role of direction tuning and its relation to stimulus dynamics could be laid out more clearly. Are the authors proposing that there are two "special" populations of A1 neurons that are treated differently to produce the biased percept? Or is there something specific about the dynamics of the Shepard stimuli and how direction selective neurons respond to them specifically? It would help if the authors could clarify if this result links to broader concepts of dynamic pitch coding in general or if the example reported here is specific (or idiosyncratic) to Shepard tones.

      We propose that the findings here are not specific to Shepard tones. To the contrary, only basic properties of auditory cortex neurons, i.e. frequency preference, frequency-direction (i.e. ascending or descending) preference, and local adaptation in the tuning curve, suffice. Each of these properties have been demonstrated many times before and we only verified this in the lead-up to the results in Fig. 6. While the same effects should be observable with pure tones, the lack of ambiguity in the perception of direction of a frequency step for pure tone pairs, would make them less noticeable here. Regarding the time-scale of the directional selectivity, we relied on the sequencing of tones in our paradigm, i.e. 150 ms spacing. The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause. The directional tuning, i.e. differences in the SSTRF above and below the preferred pitchclass for stimuli before the last stimulus, typically extended only one stimulus back in time. We have clarified this in more detail now, in particular in the added Methods section on the directional decoder.

      (2) (p. 9) "weighted by each cell's directionality index ... (see Methods for details)" The direction-selective decoder is interesting and appears critical to the study. However, the details of its implementation are difficult to locate. Maybe Fig. 6A contains the key concepts? It would help greatly if the authors could describe it in parallel with the other decoders in the Methods.

      We have expanded the description of the decoder in the Methods as the reviewer suggests.

      LESSER CONCERNS

      p. 1. (L 24) "distances between the pitch representations...." It's not obvious what "distances" means without reading the main paper. Can some other term or extra context be provided?

      We have added a brief description here.

      p. 2. (L 26) "Shepard tones" Can the authors provide a citation when they first introduce this class of stimuli?

      Citation has been added.

      p. 3 (L 4) "direction selective cells" Please define or provide context for what has a direction. Selective to pitch changes in time?

      Yes, selective to pitch changes in time is what is meant. We have further clarified this in the text.

      p. 4 (L 9-19). This paragraph seems like it belongs in the Introduction?

      Given the concerns raised by R2 about the organization of the manuscript we prefer to keep this 'road-map' in the manuscript, as a guidance for the reader.

      p. 4 (L 32) "majority of cells" One might imagine that the overlap of the bias band and the frequency tuning curve of individual neurons might vary substantially. Was there some criterion about the degree of overlap for including single units in the analysis? Does overlap matter?

      We are not certain which analysis the reviewer is referring to. Generally, cells were not excluded based on their overlap between a particular Bias band and their (Shepard) tuning curve. There are several reasons for this: The bias was located in 4 different, overlapping Shepard tone regions, and all sounds were Shepard tones. Therefore, all cells overlapped with their (Shepard) tuning curve with one or multiple of the Biases. For decoding analysis, all cells were included as both a response and lack of a response is contributing to the decoding. If the reviewer is referring only to the analysis of whether a cell adapts, then the same argument applies as above, i.e. this was an average over all Bias sequences, and therefore every responding cell was driven to respond by the Bias, and therefore it was possible to also assess whether it adapted its response for different positions inside the Bias. We acknowledge that the limited randomness of the Bias sequences in combination with the specific tuning of the cells could in a few cases create response patterns over time that are not indicative of the actual behavior for repeated stimulation, however, since the results are rather clear with 91% of cells adapting, we do not think this would significantly change the conclusions.

      p. 5 (L 17) "desynchronization ... behaving conditions" The logic here is not clear. Is less desynchronization expected during behavior? Typically, increased attention is associated with greater desynchronization.

      Yes, we reformulated the sentence to: While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [30], general response adaptation to repeated stimuli is also typical in behaving humans [31].

      p. 7 (L 5) "separation" is this a separation in time?

      Yes, added.

      p. 7 (L 33) "local adaptation" The idea of feedforward adaptation biasing encoding has been proposed before, and it might be worth citing previous work. This includes work from Nelken specifically related to SSA. Also, this model seems similar to the one described in Lopez Espejo et al (PLoS CB 2019).

      Thanks for pointing this out. We think, however, that neither of these publications suggested this very narrow way of biasing, which we consider biologically implausible. We have therefore not added either of these citations.

      p. 11 (L. 17) The cartoon in Fig. 6G may provide some intuition, but it is quite difficult to interpret. Is there a way to indicate which neuron "votes" for which percept?

      This is an excellent idea, and we have added now the purported perceptual relation of each cell in the diagram.

      p. 12 (L. 8). "classically assumed" This statement could benefit from a citation. Or maybe "classically" is not the right word?

      We have changed 'classically' to 'typically', and now cite classical works from Deutsch and Repp. We think this description makes sense, as the whole concept of bistable percepts has been interpreted as being equidistant (in added or subtracted semitone steps) from the first tone, see e.g. Repp 1997, Fig.2.

      p. 12 (L. 12) "...previous studies" of Shepard tone percepts? Of physiology?

      We have modified it to 'Relation to previous studies of Shepard tone percepts and their underlying physiology", since this section deals with both.

      p. 12 (L. 25) "compatible with cellular mechanisms..." This paragraph seems key to the study and to Major Concern 1, above. What are the dynamics of the task stimuli? How do they compare with the dynamics of neural FM tuning and previously reported studies of bias? And can the authors be more explicit in their interpretation - should direction selective neurons respond preferentially to the Shepard tone stimuli themselves? And/or is there a conceptual framework where the same neurons inform downstream percepts of both FM sweeps and both normal (unbiased) and biased Shepard tones?

      The reviewer raises a number of different questions, which we address below:

      - Dynamics of the task stimuli in relation to previously reported cellular biasing: The timescales tested in the studies mentioned are similar to what we used in our bias, e.g. Ye et al 2010 used FM sweeps that lasted for up to 200ms, which is quite comparable to our SOA of 150ms.

      - Preferred responses to Shepard tones: no, we do not think that there should be preferred responses to Shepard tones, but rather that responses to Shepard tones can be thought of as the combined responses to the constituent tones.

      - Conceptual framework where the same neurons inform about FM sweeps and both normal (unbiased) and biased Shepard tones: Our perspective on this question is as follows: To our knowledge, the classical approach to population decoding in the auditory system, i.e. weighted based on preferred frequency, has not been directly demonstrated to be read out inside the brain, and certainly not demonstrated to be read out in only this way in all areas of the brain that receive input from the auditory cortex. Rather it has achieved its credibility by being linked directly with animal performance or match with the presented stimuli. However, these approaches were usually geared towards a representation that can be estimated based on constituent frequencies. Additional response properties of neurons, such as directional selectivity have been documented and analyzed before, however, not been used for explaining the percept. We agree that our use of this cellular response preference in the decoding implicitly assumes that the brain could utilize this as well, however, this seems just as likely or unlikely as the use of the preferred frequency of a neuron. Therefore we do not think that this decoding is any more speculative than the classical decoding. In both cases, subsequent neurons would have to implicitly 'know' the preference of the input neuron, and weigh its input correspondingly.

      We have added all the above considerations to the discussion in an abbreviated form.

      p. 15 (L. 15). Is there a citation for the drive system?

      There is no publication, but an old repository, where the files are available, which we cite now: https://code.google.com/archive/p/edds-array-drive/

      p. 16 (L. 24) "position in an octave" It is implied but not explicitly stated that the Shepard tones don't contain the fundamental frequency. Can the authors clarify the relationship between the neural tuning band and the bands of the stimulus. Did a single stimulus band typically fall in a neuron's frequency tuning curve? If not 1, how many?

      Yes, it is correct that the concept of fundamental frequency does not cleanly apply to Shepard tones, because it is composed of octave spaced pure tones, but the lowest tone is placed outside the hearing range of the animal and amplitude envelope (across frequencies). Therefore one or more constituent tones of the Shepard tone can fall into the tuning curve of a neuron and contribute to driving the neuron (or inhibiting it, if they fall within an inhibitory region of the tuning curve). The number of constituent tones that fall within the tuning curve depends on the tuning width of the neurons. The distribution of tuning widths to Shepard tones is shown in Fig. S1E, which indicated that a lot of neurons had rather narrow tuning (close to the center), but many were also tuned widely, indicated that they would be stimulated by multiple constituent tones of the Shepard tone. As the tuning bandwidth (Q30: 30dB above threshold) of most cortical neurons in the ferret auditory cortex (see e.g. Bizley et al. Cerebral Cortex, 2005, Fig.12) is below 1, this means that typically not more than 1 tone fell into the tuning curve of a neuron. However, we also observed multimodal tuning-curves w.r.t. to Shepard tones, which suggests that some neurons were stimulated by more than 2 or more constituent tones (again consistent with the existence of more broadly tuned neurons (see same citation). We have added this information partly to the manuscript in the caption of Fig. S1E.

      p. 17 (L. 32). "Fig 4" Correct figure ref? This figure appears to be a schematic rather than one displaying data.

      Thanks for pointing this out, changed to Fig. 5.

      p. 18 (L. 25). "assign a pitchclass" Can the authors refer to a figure illustrating this process?

      Added.

      p. 19 (L. 17). Is mu the correct symbol?

      Thanks. We changed it to phi_i, as in the formula above.

      p. 19 (L 19). "convolution" in time? Frequency?

      Thanks for pointing this out, the term convolution was incorrect in this context. We have replaced it by "weighted average" and also adapted and simplified the formula.

      p. 19 (L 25) "SSTRF" this term is introduced before it is defined. Also it appears that "SSTRF" and "STRF" are sometimes interchanged.

      Apologies, we have added the definition, and also checked its usage in each location.

      p. 23 (Fig 2) There is a mismatch between panel labels in the figure and in the legend. Bottom right panel (B3), what does time refer to here?

      Thanks for pointing these out, both fixed.

      p. 24 (L 23) "shifts them away" away from what?

      We have expanded the sentence to: "After the bias, the decoded pitchclass is shifted from their actual pitchclass away from the biased pitchclass range ... "

      p. 25 (L 7) "individual properties" properties of individual subjects?

      Thanks for pointing this out, the corresponding sentence has been clarified and citations added.

      p. 26 (L 20) What is plotted in panel D? The average for all cells? What is n?

      Yes, this is an average over cells, the number of cells has now been added to each panel.

      p. 28 (L 3) How to apply the terms "right" "right" "middle" to the panel is not clear. Generally, this figure is quite dense and difficult to interpret.

      We have changed the caption of Panel A and replaced the location terms with the symbols, which helps to directly relate them to the figure. We have considered different approaches of adding or removing content from the figure to help make it less dense, but that all did not seem to help. For lack of better options we have left it in its current form.

      MINOR/TYPOS

      p. 3 (L 1) "Stimulus Specific Adaptation" Capitalization seems unnecessary

      Changed.

      p. 4 (L 14) "Siple"

      Corrected.

      p. 9 (L 10) "an quantitatively"

      Corrected

      p. 9 (L 20) "directional ... direction ... directly ... directional" This is a bit confusing as directseems to mean several different things in its different usages.

      We have gone through these sentences, and we think the terms are now more clearly used, especially since the term 'direction' occurs in several different forms, as it relates to different aspects (cells/percept/hypothesis). Unfortunately, some repetition is necessary to maintain clarity.

      Reviewer #2 (Recommendations For The Authors):

      Detailed critique

      Stimuli

      It would be very useful if the authors could provide demos of their stimuli on a website. Many readers will not be familiar with Shepard tones and the perceptual result of the acoustical descriptions are not intuitive. I ended up coding the stimuli myself to get some intuition for them.

      We have created some sample tones and sequences and uploaded them with the revision as supplementary documents.

      Abstract

      P1 L27 'pitch and...selective cells' - The authors haven't provided sufficient controls to demonstrate that these are "pitch cells" or "selective" to pitch direction. They have only shown that they are sensitive to these properties in their stimuli. Controls would need to be included to ensure that the cells aren't simply responding to one frequency component in the complex sound, for example. This is not really critical to the overall findings, but the claim about pitch "selectivity" is not accurate.

      Fair point. We have removed the word 'selective' in both occurrences.

      Introduction

      P2 L14-17: I do not follow the phonetic example provided. The authors state that the second syllable of /alga/ and /arda/ are physically identical, but how is this possible that ga = da? The acoustics are clearly different. More explanation is needed, or a correction.

      Apologies for the slightly misleading description, it has now been corrected to be in line with the original reference.

      P2,L26-27: Should the two uses of "frequency" be "F0" and "pitch" here? The tones are not separated in frequency by half and octave, but "separated in [F0]" by half an octave, correct? Their frequency ranges are largely overlapping. And the second 'frequency', which refers to the percept, should presumably be "pitch".

      Indeed. This is now corrected.

      P3 L2-6: Unclear at this point in the manuscript what is the difference between the 3 percepts mentioned: perceived pitch-change direction, Shepard tone pitches, and "their respective differences". (It becomes clear later, but clarification is needed here).

      We have tried a few reformulations, however, it tends to overload the introduction with details. We believe it is preferable to present the gist of the results here, and present the complete details later in the MS.

      P3 L6-7 What does it mean that the MEG and single unit results "align in direction and dynamics"? These are very different signals, so clarification is needed.

      We have phrased the corresponding sentence more clearly.

      Results

      Throughout: Choose one of 'pitch class', 'pitchclass', or 'pitch-class' and use it consistently.

      Done.

      P4L12 - would be helpful at this point to define 'repulsive effect'

      We have added another sentence to clarify this term.

      P4, L14 "simple"

      Done

      P4, L12 - not clear here what "repulsive influence" means

      See above.

      P4, L17 - alternative to which explanation? Please clarify. In general, this paragraph is difficult to interpret because we do not yet have the details needed to understand the terms used and the results described. In my opinion, it would be better to omit this summary of the results at the very beginning, and instead reveal the findings as they come, when they can be fully explained to the Reader.

      We agree, but we also believe that a rather general description here is useful for providing a roadmap to the results. However, we have added a half-sentence to clarify what is meant by alternative.

      P4 L30 - text says that cells adapt in their onset, sustained and offset responses, but only data for onset responses are shown (I think - clarification needed for fig 2A2). Supp figure shows only 1 example cell of sustained and offset, and in fact there is no effect of adaptation in the sustained response shown there.

      Regarding the effect of adaptation and whether it can be discerned from the supplementary figure: the shown responses are for 10 repetitions of one particular Bias sequence. Since the response of the cell will depend on its tuning and the specific sequence of the Shepard tones in this Bias, it is not possible to assess adaptation for a given cell. We assess the level of adaptation, by averaging all biases (similar to what is shown in Fig. 2A2) per cell, and then fit an exponential to it, separately by response type. The step direction of the exponential, relative to the spontaneous rate is then used to assess the kind of adaptation. The vast majority of cells show adaptation. We have added this information to the Methods of the manuscript.

      P4, L32 - please state the statistical test and criterion (alpha) used to determine that 91% of cells decreased their responses throughout the Bias sequence. Was this specifically for onset responses?

      Thanks for pointing this out, test and p-value added. Adaptation was observed for onset, sustained and offset responses, in all cases with the vast majority showing an adapting behavior, although the onset responses were adapting the most.

      P4 L36 - "response strength is reduced locally". What does "locally" mean here? Nearby frequencies?

      We have added a sentence here to clarify this question.

      Figure 1 - this appears to be the wrong version of the figure, as it doesn't match the caption or results text. It's not possible to assess this figure until these things are fixed. Figure 1A schematic of definition of f(diff) does not correspond to legend definition.

      As far as we can tell, it is all correct, only the resolution of the figure appears to be rather low. This has been improved now.

      Fig 2 A2 - is this also onset responses only?

      Yes, added to the caption.

      Fig 2 A3 - add y-axis label. The authors are comparing a very wide octave band (5.5 octaves) to a much narrower band (0.5 octaves). Could this matter? Is there something special about the cut-off of 2.5 octaves in the 2 bands, or was this an arbitrary choice?

      Interesting question.... essentially our stimulus design left us only with this choice, i.e. comparing the internal region of the bias with the boundary region of the bias, i.e. the test tones. The internal region just corresponds to the bias, which is 5 st wide, and therefore the range is here given as 2.5 st relative to its center, while the test tones are at the boundary, as they are 3 st from the center. The axis for the bias was mislabelled, and has now been corrected. The y-axis label is matched with the panel to the left, but has now been added to avoid any confusion.

      Fig 2A4 - does not refer to ferret single unit data, as stated in the text (p5L8). Nor does supp Fig2, as stated. Also, the figure caption does not match the figure.

      Apologies, this was an error in the code that led to this mislabelling. We have corrected the labels, which also added back the recovery from the Bias sequence in the new Panel A4.

      P5 l9 - Figure 3 is not understandable at this point in the text, and should not be referred to here. There is a lot going on in Fig 3, and it isn't clear what you are referring to.

      Removed.

      P5 L12 - by Fig 2 B1, I assume you mean A4? Also, F2B1 shows only 1 subject, not 2.

      Yes, mislabeled by mistake, and corrected now.

      Fig2B2 -What is the y-axis?

      Same as in the panel to its left, added for clarity.

      Stimuli: why are tones presented at a faster rate to ferrets than to humans?

      The main reason is that the response analysis in MEG requires more spacing in time than the neuronal analysis in the ferret brain.

      P5 L6 - there is no Fig 5 D2? I don't think it is a good idea to get the reader to skip so far ahead in the figures at this stage anyway, even if such a figure existed. It is confusing to jump around the manuscript

      Changed to 'see below'

      P5 L8 - There is no Figure 2A4, so I don't know whether this time constant is accurate.

      This was in reference to a panel that had been removed before, but we have added it back now.

      P5 L16: "in humans appears to be more substantial (40%) than for the average single units under awake conditions". One cannot directly compare magnitude of effects in MEG and single unit signals in this way and assume it is due to behavioural state. You are comparing different measures of neural activity, averaged over vastly different numbers of numbers, and recorded from different species listening to different stimuli (presentation rates).

      Yes, that's why the next sentence is: "However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.", and all statements in the preceding sentences are phrased as 'appears' and 'may'. We think we have formulated this comparison with an appropriate level of uncertainty. Further, the main message here is that adaptation is taking place in both active and passive conditions.

      P5 L25 -I do not see any evidence regarding tuning widths in Fig s2, as stated in the text.

      Corrected to Fig. S1.

      P5 l26 - Do not skip ahead to Fig 5 here. We aren't ready to process that yet.

      OK, reference removed.

      P5 l27 - Do you mean because it could be tuning to pitch chroma, not height?

      Yes, that is a possible interpretation, although it could also arise from a combination of excitatory and inhibitory contributions across multiple octaves.

      P5 l33 - remove speculation about active vs passive for reasons given above.

      Removed.

      P6L2-6 'In the present...5 semitone step' - This is an incorrect interpretation of the minimal distance hypothesis in the context of the Shepard tone ambiguity. The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low. Each constituent frequency of a single tone can therefore be perceived either as a harmonic of some lower fundamental frequency or as an independent tone. The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high). The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect. The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept.

      The reviewer here refers to a “minimal distance hypothesis”, which without a literature reference,is hard for us to fully interpret. However, some responses are given below:

      - "The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low." This statement appears to be based on some misconception: due to the octave spacing (rather than multiple/harmonics of a lowest frequency), the Shepard tones cannot be interpreted as usual harmonic tones would be. It is correct that the lowest tone in a Shepard tone is not audible, due to the envelope and the fact that it could in principle be arbitrarily small... hence, speaking about an F0 is really not well-defined in the case of a Shepard tone. The closest one could get to it would be to refer to the Shepard tone that is both in the audible range and in the non-zero amplitude envelope. But again, since the envelope is fading out the highest and lowest constituent tones, it is not as easy to refer to the lowest one as F0 (as it might be much quieter than the next higher constituent.

      - "The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high)." This may relate to some known psychophysics, but we are unable to interpret it with certainty.

      - "The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect." We are unsure how the reviewer reaches this conclusion.

      - "The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept." Again, in the absence of a reference to the MDH, we are unsure of the implied rationale. We agree that this is a possible interpretation of distance, however, we believe that our interpretation of distance (i.e. distances between constituent tones) is also a possible interpretation.

      Fig 4: Given that it comes before Figure 3 in the results text, these should be switched in order in the paper.

      Switched.

      PCA decoder: The methods (p18) state that the PCA uses the first 3 dimensions, and that pitch classes are calculated from the closest 4 stimuli. The results (P6), however, state that the first 2 principal components are used, and classes are computed from the average of 10 adjacent points. Which is correct, or am I missing something?

      Thanks for pointing this out, we have made this more concrete in the Methods to: "The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 43A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle." Regarding the number of stimuli that were averaged: this might be a slight misunderstanding: Each Shepard tone was decoded/projected without averaging. However, to then assign an estimated pitch class, we first had to establish an axis (here going around the circle), where each position along the axis was associated with a pitch class. This was done by stepping in 0.5 semitone steps, and finding the location in decoded space that corresponded to the median of the Shepard tones within +/- 0.25st. To increase the resolution, this circular 'axis' of 24 points was then linearly interpolated to a resolution of 0.05st. We have updated the text in the Methods accordingly. The mentioning of 10 points for averaging in the Results was correct, as there were 240 tones in all bias stimuli, and 24 bins in the pitch circle. The mentioning of an average over 4 tones in the Methods was a typo.

      Fig 3A: axes of pink plane should be PC not PCA

      Done.

      Fig 3B: the circularity in the distribution of these points is indeed interesting! But what do the authors make of the gap in the circle between semitones 6-7? Is this showing an inherent bias in the way the ambiguous tone is represented?

      While we cannot be certain, we think that this represents an inhomogeneous sampling from the overall set of neural tuning preferences, and that if we had recorded more/all neurons, the circle would be complete and uniformly sampled (which it already nearly is, see Fig.4C, which used to be Fig. 3C).

      Fig 3B (lesser note): It'd be preferable to replace the tint (bright vs. dark) differentiation of the triangles to be filled vs. unfilled because such a subtle change in tint is not easily differentiable from a change in hue (indicating a different variable in this plot) with this particular colour palette

      We have experimented with this suggestion, and it didn't seem to improve the clarity. However, we have changed the outline of the test-pair triangles to white, which now visually separates them better.

      P6 l32 - Please indicate if cross-validation was used in this decoder, and if so, what sort. Ideally, the authors would test on a held-out data set, or at least take a leave-one-out approach. Otherwise, the classifier may be overfit to the data, and overfitting would explain the exceptional performance (r=.995) of the classifier.

      Cross-validation was not used, as the purpose of the decoder is here to create a standard against which to compare the biased responses in the ambiguous pair, which were not used for training of the decoder. We agree that if we instead used a cross-validated decoder (which would only apply to the local average to establish the pitch class circle) the correlation would be somewhat lower, however, this is less relevant for the main question, i.e. the influence of the Bias sequence on the neural representation of the ambiguous pair. We have added this information to the corresponding section.

      Fig 3D: I understood that these pitch classifications shown by the triangles were carried out on the final ambiguous pair of stimuli. I thought these were always presented at the edges of the range of other stimuli, so I do not follow how they have so many different pitchclass values on the x-axis here.

      There were 4 Biases, centered at 0,3,6 or 9 semitones, and covering [-2.5,2.5]st relative to this center. Therefore the edges of the bias ranges (3st away from their centers) happen to be the same as the centers, e.g. for the Bias centered at 3, the ambiguous pair would be a 0-6 or 6-0 step. Therefore there are 4 locations for the ambiguous tones on the x-axis of Fig. 4D (previously 3D).

      Figure 4: This demonstration of the ambiguity of Shepard pairs may be misleading. The actual musical interval is never ambiguous, as this figure suggests. Only the ascending vs descending percept is ambiguous. Therefore the predictions of the ferret A1 decoding (Fig 3D) and the model in Fig 5 are inconsistent with perception in two ways. One (which the authors mention) is the direction of the bias shift (up vs down). Another (not mentioned here) is that one never experiences a shift in the shepard tone at a fraction of a semitone - the musical note stays the same, and changes only in pitch height, not pitch chroma.

      We are unsure of the reviewer’s direction with this question. In particular the second point is not clear to us: "...one (who?) never (in this experiment? in real life?) experiences a bias shift in the Shepard tone at a fraction of a semitone" (why is this relevant in the current experiment?). Pitch chrome would actually be a possible replacement for pitch class, but somehow, the previous Shepard tone literature has referred to it as pitch class.

      P7 l12 - omit one 'consequently'

      Changed to 'Therefore'.

      P7 l24 - I encourage the authors to not use "local" and "global" without making it clear what space they refer to. One tends to automatically think of frequency space in the auditory system, but I think here they mean f0 space? What is a "cell close to the location of the bias"? Cells reside in the brain. The bias is in f0 space. The use of "local" and "global" throughout the manuscript is too vague.

      Agreed, the reference here was actually to the cell's preferred pitch class, not its physical location (which one might arguably be able to disambiguate, given the context). We have changed the wording, and also checked the use of global/local throughout the manuscript. The main use of 'global/local' is now in reference to the range of adaptation, and is properly introduced on first mention.

      P7 L26 -there is no Fig 5D1. Do you mean the left panel of 5D?

      Thanks. Changed.

      FigS3 is referred to a lot on p7-8. Should this be moved to the main text?

      The main reason why we kept it in the supplement is that it is based on a more static model, which is intended to illustrate the consequences of different encoding schemes. In order to not confuse the reader about these two models, we prefer to keep it in the supplement, which - for an online journal - makes little difference since the reader can just jump ahead to this figure in the same way as any other figure.

      Fig 5C, D - label x-axis.

      Added.

      Fig 5E - axis labels needed. I don't know what is plotted on x and y, and cannot see red and green lines in left plot

      Thanks for noticing this, colors corrected, axes labeled.

      Page 8 L3-15 - If I follow this correctly, I think the authors are confusing pitch and frequency here in a way that is fundamental to their model. They seem to equate tonotopic frequency tuning to pitch tuning, leading to confused implications of frequency adaptation on the F0 representation of complex sounds like Shepard tones. To my knowledge, the authors do not examine pure tone frequency tuning in their neurons in this study. Please clarify how you propose that frequency tuning like that shown in Fig 5A relates to representation of the F0 of Shepard tones. Or...are the authors suggesting these neural effects have little to do with pitch processing and instead are just the result of frequency tuning for a single harmonic of the Shepard tones?

      We agree that it is not trivial to describe this well, while keeping the text uncluttered, in particular, because often tuning properties to stimulus frequency contribute to tuning properties of the same neuron for pitch class, although this can be more or less straightforward: specifically, for some narrowly tuned cells, the Shepard tuning is simply a reflection of their tuning to a single octave range of the constituent tones (see Fig. S1). For more broadly tuned cells, multiple constituent tones will contribute to the overall Shepard tuning, which can be additive, subtractive, or more complex. The assumption in our approach is that we can directly estimate the Shepard tuning to evaluate the consequence for the percept. While this may seem artificial, as Shepard tones do not typically occur in nature, the same argument could be made against pure tones, on which classical tuning curves and associated decodings are often based. Relating the Shepard tuning to the classical tuning would be an interesting study in itself, although arguably relating the tuning of one artificial stimulus to another. Regarding the terminology of pitch, pitch class and frequency: The term pitch class is commonly used in the field of Shepard tones, and - as we indicated in the beginning of the results: "the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study". We agree that the term pitch, which describes the perceptual convergence/construction of a tone-height from a range of possible physical stimuli, needs to be separated from frequency as one contributor/basis for the perception of a pitch. However, we think that the term pitch can - despite its perceptual origin - also be associated with neuron/neural responses, in order to investigate the neural origin of the pitch percept. At the same time, the present study is not targeted to study pitch encoding per se, as this would require the use of a variety of stimuli leading to consistent pitch percepts. Therefore, pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, based on the previous literature, and the fact that Shepard tones are composite stimuli that lead to a pitch percept. The last sentence has been added to the manuscript for clarity.

      P7-9: I wasn't left with a clear idea of how the model works from this text. I assume you have layers of neurons tuned to frequency or f0 (based on the real data?), which are connected in some way to produce some sort of output when you input a sound? More detail is needed here. How is the dynamic adaptation implemented?

      The detailed description of the model can be found in the Methods section. We have gone through the corresponding paragraph and have tried to clarify the description of the model by introducing a high-level description and the reference to the corresponding Figure (Fig. 5A) in the Results.

      Fig6A: Figure caption can't be correct. In any case, these equations cannot be understood unless you define the terms in them.

      We have clarified the description in the caption.

      Fig 6/directionality analysis: Assuming that the "F" in the STRFs here is Shepard tone f0, and not simple frequency?

      We have changed the formula in the caption and the axis labels now.

      Fig 6C - y-axis values

      In the submission, these values were left out on purpose, as the result has an arbitrary scale, but only whether it is larger or smaller than 0 counts for the evaluation of the decoded directionality (at the current level of granularity). An interesting refinement would be to relate the decoded values to animal performance. We have now scaled the values arbitrarily to fit within [-1,1], but we would like to emphasize that only their relative scale matters here, not their absolute scale.

      Fig 6E - can't both be abscissa (caption). I might be missing something here, but I don't see the "two stripes" in the data that are described in the caption.

      Thank you. The typo is fixed. The stripes are most clearly visible in the right panel of Fig. 6E, red and blue, diagonally from top left to bottom right.

      Fig 6G -I have no idea what this figure is illustrating.

      This panel is described in the text as follows: "The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same frequency location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone."

      I might be just confused or losing steam at this point, but I do not follow what has been done or the results in Fig 6 and the accompanying text very well at all. Can this be explained more clearly? Perhaps the authors could show spike rate responses of an example up-direction and down-direction neuron? Explain how the decoder works, not just the results of it.

      We agree that we are presenting something new here. However, it is conceptually not very different from decoding based on preferred frequencies. We have attempted to provide two illustrations of how the decoder works (Fig. 6A) and how it then leads to the percept using prototypical examples of cellular SSTRFs (Fig. 6G). We have added a complete, but accessible description to the Methods section. Showing firing rates of neurons would unfortunately not be very telling, given the usual variability in neural response and the fact that our paradigm did not have a lot of repetitions (but instead a lot of conditions), which would be able to average out the variability on a single neuron level.

      Discussion - I do not feel I can adequately critique the author's interpretation of the results until I understand their results and methods better. I will therefore save my critique of the discussion section for the next round of revisions after they have addressed the above issues of disorganization and clarity in the manuscript.

      We hope that the updated version of the manuscript provides the reviewer now with this possibility.

      Methods

      P15L7 - gender of human subjects? Age distribution? Age of ferrets?

      We have added this information.

      P16L21 - What is the justification for randomizing the phase of the constituent frequencies?

      The purpose of the randomization was to prevent idiosyncratic phase relationships for particular Shepard tones, which would depend in an orderly fashion on the included base-frequencies if non-randomized, and could have contributed to shaping the percept for each Shepard tone in a way that was only partly determined by the pitch class of the Shepard tone. Added to the section.

      P17L6 - what are the 2 randomizations? What is being randomized?

      Pitch classes and position in the Bias sequence. Added to the section.

      P16 Shepard Tuning section - What were the durations of the tones and the time between tones within a trial?

      Thanks, added!

      Equations - several undefined terms in the equations throughout the manuscript.

      Thanks. We have gone through the manuscript and all equations and have introduced additional definitions where they had been missing.

      Reviewer #3 (Recommendations For The Authors):

      P3L10: "passive" and "active" conditions come totally out of the blue. Need introducing first. (Or cut. If adaptation is always seen, why mention the two conditions if the difference is not relevant here?)

      We have added an additional sentence in the preceding paragraph, that should clarify this. The reason for mentioning it is that otherwise a possible counter-argument could be made that adaptation does not occur in the active condition, which was not tested in ferrets (but presents an interesting avenue for future research).

      P3L14 "siple" typo

      Corrected.

      P4L1 "behaving humans" you should elaborate just a little here on what sort of behavior the participants engaged in.

      Thanks for pointing this out. We have clarified this by adding an additional sentence directly thereafter.

      P4 adaptation: I wonder whether it would be useful to describe the Bias condition a bit more here before going into the observations. The reader cannot know what to expect unless they jump ahead to get a sense of what the Bias looks like in the sense of how many stimuli are in it, and how similar they are to each other. Observations such as "the average response strength decreases as a function of the position in the Bias sequence" are entirely expected if the Bias is made up of highly repetitive material, but less expected if it is not. I appreciate that it can be awkward to have Methods after Results, but with a format like that, the broad brushstroke Methods should really be incorporated into the Results and only the tedious details should be reserved for the Methods to avoid readers having to jump back and forth.

      Agreed, we have inserted a corresponding description before going into the details of the results.

      Related to this (perhaps): Bottom of P4, top of P5: "significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias" ... I am at a loss as to what the red and blue symbols in Fig 2 A3 really show, and I wonder whether the "at the edges" to "within the Bias" comparison were to make sense if at this stage I had been told more about the composition of the Bias sequence. Do the ambiguous ('target') tones also occur within the Bias? As I am unclear about what is compared against what I am also not sure how sound that comparison is.

      We have added an extended description of the Bias to the beginning of this section of the manuscript. For your reference: the Shepard tones that made up the ambiguous tones were not part of the Bias sequence, as they are located at 3st distance from the center of the Bias (above and below), while the Bias has a range of only +/- 2.5st.

      Fig 2: A4 B1 B2 labels should be B1 B2 B3

      Corrected.

      Fig 2 A2, A3: consider adjusting y-axis range to have less empty space above the data. In A3 in particular, the "interesting bit" is quite compressed.

      Done, however, while still matching the axes of A2 and A3 for better comparability.

      I am under the strong impression that the human data only made it into Fig 2 and that the data from Fig 3 onwards are animal data only. That is of course fine (MEG may not give responses that are differentiated enough to perform the sort of analyses shown in the later figures. But I do think that somewhere this should be explicitly stated.

      Yes, the reviewer's observation is correct. The decoding analyses could not be conducted on the human MEG data and was therefore not further pursued. Its inclusion in the paper has the purpose of demonstrating that even in humans and active conditions, the local adaptation is present, which is a key contributor to the two decoding models. We now state this explicitly when starting the decoding analysis.

      P5L2 "bias" not capitalized. Be consistent.

      All changed to capitalized.

      P5L8 reference to Fig 2 A4: something is amiss here. From legend of Fig 2 it seems clear that panel A4 label is mislabeled B1. Maybe some panels are missing to show recovery rates?

      Apologies for this residual text from a previous version of the manuscript. We have gone through all references and corrected them.

      P6L7 comma after "decoding".

      Changed.

      Fig 3, I like this analysis. What would be useful / needed here though is a little bit more information about how the data were preprocessed and pooled over animals. Did you do the PCA separately for each animal, then combine, or pool all units into a big matrix that went into the PCA? What about repeat, presentations? Was every trial a row in the matrix, or was there some averaging over repeats? (In fact, were there repeats??)

      Thanks for bringing up these relevant aspects, which were partly insufficiently detailed in the manuscript. Briefly, cells were pooled across animals and we only used cells that could meaningfully contribute to the decoding analysis, i.e. had auditory responses and different responses to different Shepard tones. Regarding the responses, as stated in the Methods, "Each stimulus was repeated 10 times", and we computed average responses across these repetitions. Single trials were not analyzed separately. We have added this information in the Methods, and refer to it in the Results.

      Also, there doesn't appear to be a preselection of units. We would not necessarily expect all cortical neurons to have a meaningful "best pitch" as they may be coding for things other than pitch. Intuitively I suspect that, perhaps, the PCA may take care of that by simply not assigning much weight to units that don't contribute much to explained variance? In any event I think it should be possible, and would be of some interest, to pull out of this dataset some descriptive statistics on what proportion of units actually "care about pitch" in that they have a lot (or at least significantly more than zero) of response variance explained by pitch. Would it make sense to show a distribution of %VE by pitch? Would it make sense to only perform the analysis in Fig 3 on units that meet some criterion? Doing so is unlikely to change the conclusion, but I think it may be useful for other scientists who may want to build on this work to get a sense of how much VE_pitch to expect.

      We fully agree with the reviewer, which is why this information is already presented in Supplementary Fig.1, which details the tuning properties of the recorded neurons. Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low, led to quantitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually that made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones. We agree that the PCA based preprocessing would also automatically exclude many of the cells that were already excluded with the more concrete criteria beforehand. We have added further information on this issue in the Methods section under the heading 'Unit selection'.

      P9 "tones This" missing period.

      Changed.

      P10L17 comma after "analysis"

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Some critical comments are provided below:

      (1) The data quality still needs to be improved. There are many outliers in the experimental data shown in some figures, e.g. Figure 2D-G. The presence of these outliers makes the results unreliable. The author should thoroughly review the data analysis in the manuscript. In addition, a couple of western blot bands, such as IL-1β in Figure 3C, are not clear enough, please provide clearer western blot results again to support the conclusion.

      Following our comparative analysis, we have determined that these data do not affect our conclusions. Moreover, our experimental design included a total of six mice per group, with all mouse samples being subjected to testing.

      (2) As shown in Figure 1G-I, foot thickness and IL-1β content in foot tissues of the Aged+Abx group were significantly reduced, but there was no difference in serum uric acid level. In addition, the Abx-untreated group should be included at all ages.

      Thank you for your comment. We have included this data in Supplemental Material 4.

      (3) Since FMT (Figure 4) and butyrate supplementation (Figure 8) have different effects on uric acid synthesis enzyme and excretion, different mechanisms may lie behind these two interventions. Transplantation with significantly enriched single strains from young mice, such as Bifidobacterium and Akkermansia, is the more reliable approach to reveal the underlying mechanism between gut microbiota and gout.

      Thank you for your comment. Due to the involvement of multiple bacterial genera in gout and hyperuricemia, and the practical challenge of testing all strains, our focus shifted to the functional implications and metabolism of the microbiota. Experimental validation confirmed that butyrate exerts a dual-therapeutic effect in mitigating gout and hyperuricemia.

      (4) In Figure 2F, the results showed the IL-1β, IL-6, and TNF-α content in serum, which was inconsistent with the authors' manuscript description (Line 171).

      Thank you for your comment. The modifications to the results have been implemented.

      (5) Figures 2F-H duplicate Supplementary Figures S1B-D. The authors should prepare the article more carefully to avoid such mistakes.

      Thank you for your comment. We have corrected it in the manuscript.

      (6) In lines 202-206, the authors stated that the elevated serum uric acid levels in the Young+Old or Young+Aged groups, but there is no difference in the results shown in Figure 4A.

      Thank you for your comment. We have corrected it in the manuscript.

      (7) Please visualize the results in Table 2 in a more intuitive manner.

      The results have been presented in Table 2 with a more intuitive visual format. The detailed information is presented in Supplement 4.

      (8) The heatmap in Figure 7A cannot strongly support the conclusion "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group". The author should re-represent the visual results and provide a reasonable explanation. In addition, please provide the ordinate unit of Supplementary Figure 7A-H.

      Thank you for your comment. Figure 7A and Supplementary Figure 7A-H together illustrate "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group", and the specific units of short-chain fatty acids have been annotated in the manuscript.

      (9) Uncropped original full-length western blot should be provided.

      Thank you for your comment. We have made relevant notes in the paper.

      Reviewer #1 (Recommendations For The Authors):

      Gout, a prevalent form of arthritis among the elderly, exhibits an intricate relationship with age and gut microbiota. The authors found that gut microbiota plays a crucial role in determining susceptibility to age-related gout. They observed that age-related gut microbiota regulated the activation of the NLRP3 inflammasome pathway and modulated uric acid metabolism. "Younger" microbiota has a positive impact on the gut microbiota structure of old or aged mice, enhancing butanoate metabolism and butyric acid content. Finally, they found butyric acid exerts a dual effect, inhibiting inflammation in acute gout and reducing serum uric acid levels. This work's insights emphasize the potential of "young" gut microbiome in mitigating senile gout. The whole study was interesting, but there were some minor errors in the overall writing of the paper. The author should carefully check the spelling of the words in the text and the case consistency of the group names.

      Questions:

      (1) Line 118, line 142, and elsewhere 24 months in the same format as before.

      Thank you for your comment. We have corrected it in the manuscript.

      (2) Lines 123, Old and Aged group should be a complex number.

      Thank you for your suggestion. We have corrected it in the manuscript.

      (3) Why does line 133 mention the use of ABX? Please add a brief explanation.

      Thank for your suggestion. The aim of utilizing ABX is to construct the linkage between gut microbiota, age, and gout.

      (4) Lines 172-175, the description of TNF does not match the description of the result figure, may be the picture placement error, please correct this.

      Thank you for your careful review. The error has been corrected and the accurate result has been inserted into the original manuscript.

      (5) Lines183-185 and lines193-lines195, Pro-Caspase-1 and Pro-IL activate excess write.

      Thank you for your careful review. We have corrected the error at the original location.

      (6) Line 400, the text should not be written as increased.

      Thank you for your careful review. We have corrected the error at the original location.

      (7) "ns" needs to be added in the legend to indicate that there is no significant difference.

      Thank you for your careful review. We have corrected the error at the original location.

      (8) Lines 1080-1084 "Old or Aged control group and the old or aged group", group names should be case-sensitive.

      Thank you for your suggestion. We have made the correct modification to the group names.

      (9) Lines 1072-1073, "Representative western blot images of foot tissue NLRP3 pathways proteins" add band density.

      Thank you for your suggestion. We have corrected the error on lines 1072-1073 of the article.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) In Figures 1G-H, the Aged+PBS group with antibiotic treatment shows a significant reduction in foot swelling and IL-1β compared to the Young+PBS and Old+PBS groups. The authors state that age-related changes in the gut microbiota exacerbate gout. However, why does only the Aged+PBS group improve with antibiotic treatment? It seems that butyrate alone cannot explain this phenomenon.

      We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Different age groups are directly given antibiotics for treatment. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors changing with age disappears.

      (2) In Figure 2, the fecal transplantation from young mice improved the infiltration of inflammatory cells and inflammatory cytokines in the Old and Aged groups. However, in Supplementary Figure 1A, there is no improvement observed in the percentage of foot swelling. Is it appropriate to conclude that inflammation was improved even though foot swelling was not suppressed?

      Although we did not observe changes in the swelling of the mice's feet, there were changes in the inflammatory cell infiltration and inflammation factors in the slices. We rely on a comprehensive assessment of various indicators to determine whether the inflammatory condition has improved or worsened.

      (3) In line #249, the authors state that "the fecal microbiota from mice in the young group promotes uric acid elimination, inhibits reabsorption, and may contribute to the integrity of the intestinal barrier structure." However, Supplementary Figure 3F-H shows no significant alterations in Occludin and ZO-1 mRNA expression levels among all groups. Therefore, it is difficult to conclude that the fecal microbiota from the young group promotes the integrity of the intestinal barrier structure. A functional barrier assay, such as oral administration of FITC-dextran, would be necessary to verify the authors' conclusion.

      In Supplementary Figure 3F-H, we observed that the mRNA expression of Occludin and ZO-1 increased but showed no significant difference. However, after the elderly mice were transplanted with the intestinal microbiota of young mice, the mRNA expression of JAMA showed a significant upward trend. Additionally, due to the scarcity of old mice, we were unable to perform the oral administration of FITC-dextran. However, we supplemented with immunohistochemical slices of Zo-1 and Occludin to support our viewpoint.

      (4) In Figure 4, when comparing the young+PBS group with the old+PBS or aged+PBS groups, there are hardly any differences in the proteins involved in uric acid synthesis (ADA, GDA, XOD) or the genes involved in uric acid transport (URAT1, GLUT9, OAT1, OTA3, ABCG2). Since no changes in uric acid synthesis or transport pathways are observed with aging, it is questionable to conclude that fecal transplantation from young mice improves these pathways and lowers blood uric acid levels.

      In the calculation process, we used different age groups of the control group as references, instead of directly using young mice. We then compared the data of mice of different ages, and the results are in Supplementary Material 4.

      (5) In line 276, the authors describe "the Young +Old and Young+Aged groups tended to be closer to the Old+PBS and Aged+PBS groups, and the Old+Young and Aged+young groups tended to be closer to the Young+PBS group (Figure 5D)". Please conduct a statistical analysis.

      (6) In line 298, the authors hypothesize that butyrate might be the key molecule responsible for controlling gout, as Bifidobacterium and Akkermansia were abundant in the Young group, and the butyrate pathway was prominent. However, neither Bifidobacterium nor Akkermansia are butyrate-producing bacteria. Thus, the conclusion appears to be biased toward butyrate, raising questions about this interpretation.

      Upon comparison, we discovered other bacteria genera that produce butyrate, such as Lachnoclostridium. Additionally, literature (PMID:38126785, 26420851) reports have indicated that Bifidobacteria combined with other genera can enhance the production of butyrate. Meanwhile, Akkermansia, particularly the species Akkermansia muciniphila, has been found to confer several beneficial traits, as evidenced by preclinical studies. These traits include promoting the growth of butyrate-producing bacteria through the production of acetate, which leads to a decrease in the loss of the colonic bilayer and subsequent reduction in inflammation (PMID:35468952). Based on the predicted results of microbiome functions, we observed that the Butanoate_metabolism of the microbiota in young mice and the elderly mice recipients of young mouse microbiota was enhanced. Considering that Lachnoclostridium can produce butyrate, and that Bifidobacteria and Akkermansia can promote the production of butyrate by the intestinal microbiota, we speculated that butyrate might play a role in gout and hyperuricemia.

      (7) In Supplementary Figure 7, acetic acid and propionic acid also show the same behavior as butyric acid. It is possible that these metabolites may also affect the development of gout.

      Thank you for your suggestion. Indeed, Figure 7 does show a similar trend for acetic and propionic acids as for butyric acid. However, considering the predictive data of microbial function and the non-targeted metabolomic data, there is an enhancement of Butanoate_metabolism in both young mice and elderly mice receiving young mouse intestinal microbiota transplants. Therefore, we prioritized butyrate as the subject of our study. Due to the scarcity of elderly mice, we are unable to conduct subsequent experiments with acetic and propionic acids, which is one of the limitations of this study. This work will be addressed in our follow-up research.

      (8) In Figure 6, the secondary bile acid biosynthesis pathway was also changed. However, there is little mention of secondary bile acid in the discussion section. Please carefully discuss other possibilities besides butyrate.

      Thank you for your suggestion. We have incorporated a discussion about secondary bile acids into the relevant section of our manuscript.

      (9) In line #330, the authors state, 'the metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway (Figure 6A-D).' However, there does not appear to be much difference in the butanoate metabolism pathway. Specifically, in Figure 6C, the butanoate metabolism pathway in the Old group does not differ from that in the Young group. Please explain in more detail whether the butanoate metabolism pathway is relevant in the Old group.

      The metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway. The differential metabolites are enriched in the butyrate metabolism pathway; however, the non-targeted metabolomics did not reveal the extent of their enrichment.

      (10) In Figure 7, the authors measured the levels of short-chain fatty acids in the Young and Aged groups. They found butyrate in the feces of mice in the Young group was higher than that in the Aged group. However, I wonder whether the Old group also had low levels of butyrate or not.

      In the experiment, we selected three representative groups to verify the hypothesis that butyrate may play a significant role in gout and hyperuricemia. Subsequently, we found that supplementing 18-month-old and 24-month-old mice with butyrate indeed reduced blood uric acid levels and alleviated gout symptoms. Since 18-month-old mice are difficult to obtain, we only conducted microbiome sequencing and non-targeted metabolomic analysis.

      Minor issues:

      (11) In line 74, what does MSU stand for? Please describe the abbreviation.

      In line 74, MSU refers to Monosodium urate crystals.

      (12) In line 136, please insert a space between "IL-1β" and "and".

      Thank you for your suggestion. We have corrected the error of the article.

      (13) In line 570, please describe the method of butyrate administration and also correct the grammatical errors.

      Thank you for your suggestion. We have corrected the error of the article.

      (14) Change the title of x axis in Figure 2F-H, "Serum ~" to "Peritoneal fluid ~", according to the legend.

      Thank you for your suggestion. We have corrected this error in the manuscript.

      (15) In line 302, "succinates" should be "butyric acid or butyrate".

      Thank you for your suggestion. We have corrected this error in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed the results of IL-1β levels in foot tissues in Figure 1C and Figure 1H, and serum IL-1β, IL-6, and TNF-α levels in Figure 2F-H. Could the authors also provide the results of IL-6 and TNF-α in foot tissue in Figure 1?

      Thank you for your suggestion. We have added the results of of IL-6 and TNF-α in foot tissue in supplementary material 4.

      (2) There are some errors in the reference citation format, such as missing page numbers.

      Thank you for your careful review. We have revised the references in our manuscript.

      (3) There are too many writing errors in the manuscript, which greatly affect the understanding of the text. The manuscript must be carefully revised to improve its readability. It's recommended that a professional English writer or native speaker proofread the paper before submission. Some errors, but not limited to these errors, are listed below.

      a. Line 107: The abbreviation for "short-chain fatty acid" should be SCFA, not SFCA.

      Thank you for your careful review. We have corrected this error in the manuscript.

      b. Line 136: There is a missing space between IL-1β and and. B.

      Thank you for your careful review. We have corrected this error in the manuscript.

      c. Line 145, the phrase "on gout on gout", and line 471, "that transplantation" are repeated.

      Thank you for your careful review. We have corrected this error in the manuscript.

      d. Line 152: "Age+PBS" should be "Aged+PBS".

      Thank you for your careful review. We have corrected this error in the manuscript.

      e. In Figure 1e, "Aded+PBS" should be "Aged+PBS".

      Thank you for your careful review.  We have corrected the error in Figure 1e.

      f. Line 152: The phrase "by via" is repeated.

      Thank you for your suggestion. We have deleted the phrase "by via" in line 152.

      g. "16S rDNA" in line 92 is inconsistent with the "16S rRNA" in line 652.

      Thank you for your suggestion. We have revised the error in the manuscript to maintain consistency in professional terminology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      This important work substantially advances our understanding of the process of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is solid and would benefit from more rigorous experiments.

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We have linked the adult phenotype to the larval model to explore the ROS/TrpA1/Dh31 axis in both contexts.  As highlighted in the discussion, however, there are key behavioral differences between larvae and adult flies. Unlike larvae, which remain in the food environment, adult flies have the ability to move away. This difference could impact the relevance of gut muscle contraction and bacterial clearance mechanisms between the two stages. Specifically, in larvae, the rapid ejection of gut contents due to muscle contraction poses a unique risk: larvae may inadvertently re-ingest the expelled material within minutes, which could influence their immune defenses. We have clarified this distinction and our hypothesis in the final section of the discussion, as it emphasizes the adaptive nature of this mechanism in larvae.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      To address this, we have provided new data (Movie 5), in which larvae were fed a lower dose of Bt-GFP at 1.3 × 10^10 CFU/mL. In this video, we observe that when larvae ingest fewer bacteria, no blockage occurs, and the bacteria are able to reach the posterior midgut. As the bacterial load is lower, the fluorescence signal is weaker, but the movie clearly shows the excretion of bacteria. Importantly, under these conditions, no larval death was observed. These findings suggest that below a certain bacterial threshold, the pathogenicity is insufficient to: (1) trigger the blockage response, and (2) kill the larvae. In such cases, bacteria are likely eliminated through normal peristaltic movements rather than through the blockage mechanism described in our study.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      As mentioned in our previous response, we hypothesize that the larvae’s ability to resist low concentrations of pathogenic bacteria is likely due to being below the threshold of virulence. At lower bacterial doses, the pathogenic load is insufficient to trigger the blockage mechanism or cause larval death. In these cases, it is probable that classical peristaltic movements of the gut efficiently eliminate the bacteria, preventing them from colonizing the posterior midgut or causing significant harm. Thus, the larvae rely on standard gut motility and immune mechanisms, rather than the blockage response, to clear lower doses of bacteria.

      Why is this model only applied to high-dose infections? 

      The reason this model primarily applies to high-dose infections is that lower concentrations of pathogenic bacteria do not trigger the blockage mechanism. As we mentioned in the manuscript, for low bacterial concentrations, where the GFP signal remains detectable, wild-type larvae are still able to resist live bacteria in the posterior part of the intestine.

      Regarding the bacterial doses used in our experiments, it's important to clarify that we calculate the bacterial load based on colony-forming units (CFU). In our setup, there are approximately 5 × 10^4 CFU per midgut. For each experiment, we prepare 500 µl of contaminated medium containing 4 × 10^10 CFU. Fifty larvae are placed into this 500 µl of medium, meaning each larva ingests around 5 × 10^4 CFU within one hour of feeding.

      This leads us to two key points:

      (1) Continuous feeding might trigger the blockage response even at lower doses, as extended exposure to bacteria could lead to higher accumulation within the gut.

      (2) Other defense mechanisms, such as the production of reactive oxygen species (ROS) or classical peristaltic movements, could be sufficient to eliminate lower bacterial doses (around 10^3 CFU or below).

      We also refer to the newly provided Movie 5, where larvae fed with Bt-GFP at 1.3 × 10^10 CFU/mL show no blockage at low ingestion levels and successfully eliminate the bacteria.

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      During the 4 to 6-hour period, several defense mechanisms are activated. ROS play a bacteriostatic and bacteriolytic role, helping to control bacterial growth. Concurrently, the IMD pathway is activated, leading to the transcription, translation, and secretion of antimicrobial peptides. These AMPs exert both bacteriostatic and bacteriolytic effects, contributing to the eventual clearance of the pathogenic bacteria.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We have provided new data (Supplementary Figure 6) that includes RT-qPCR analysis of the whole larval gut in wt, TrpA1- and Dh31- genetic background after feeding with Lp, Ecc15, Bt, or yeast only. We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences between the genotypes tested.

      Additionally, we included new imaging data (Supplementary Figure 11) from AMP reporter larvae (Dpt-Cherry) fed with fluorescent Lp or Bt. In larvae infected with Bt, which is blocked in the anterior part of the gut, the dpt gene is predominantly induced in this region, indicating strong IMD pathway activity in response to Bt infection. Conversely, in larvae fed with Lp-GFP, the Dpt-Cherry reporter shows weak expression in the anterior midgut, and is barely detectable in the posterior midgut where Lp-GFP establishes itself. This aligns with previous findings by Bosco-Drayon et al. (2012), which demonstrated low AMP expression in the posterior midgut due to the presence of negative regulators of the IMD pathway, such as amidases and Pirk.

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      Based on our new data (Supplementary Figure 11), we observe that Dpt-RFP expression is primarily localized in the anterior midgut and likely in the beginning of acidic region in larvae infected with Bt, Ecc and Lp. 

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      We observe that bacteria are not evenly distributed along the gut in wild-type larvae as well, with LP. This suggests that the transit time in the anterior part of the gut may be relatively short due to active peristaltism, which would make this region function as a "checkpoint" for bacteria that are not supposed to be blocked. Indeed, we confirmed that peristaltism is active during our intoxication experiments, which could explain the rapid movement of bacteria through the anterior midgut.

      In contrast, bacteria tend to remain longer in the posterior midgut, which corresponds to the absorptive functions of intestinal cells in this region. This would explain why we observe more bacteria in the posterior midgut for Lp in control larvae and for Ecc15 and Bt in the TrpA1- and Dh31- mutants. Although a few bacteria are still found in the anterior midgut, they are consistently in much lower numbers compared to the posterior, as shown in Figures 1A and 3A of our manuscript.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We investigated whether the ROS/TrpA1/Dh31 axis influences AMP expression by performing RT-qPCR on the whole gut of larvae in wild-type, TrpA1-, and Dh31- genetic backgrounds. Larvae were fed with Lp, Ecc, Bt, or yeast (new data: Supplementary Figure 6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the different genotypes.

      Additionally, we provide imaging data from AMP reporter larvae (pDpt-Cherry) fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These results further confirm that the ROS/TrpA1/Dh31 axis does not significantly affect AMP expression in our experimental conditions.

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      We agree that the TARM structures are a fascinating aspect of this study and acknowledge the interest in their potential role in the blocking and killing phenotypes. While we are keen to explore the specific contributions of these structures during bacterial intoxication, the current genetic tools available for manipulating TARMs target both TARM T1 and T2 simultaneously, as demonstrated by Bataillé et al., 2020 (Fig. 2). Of note, these muscles are essential for proper gut positioning in larvae, and their absence leads to significant defects in food intake and transit, which would confound the results of our intoxication experiments (see Fig. 6 from Bataillé et al., 2020).

      Therefore, while TARMs are likely involved in these processes, the current limitations in selectively targeting them prevent us from definitively testing their role in bacterial blocking and killing at this stage. We hope to address this in future studies as more refined genetic tools become available.

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      To determine whether the ROS/TrpA1/Dh31 axis is required for the formation of TARM structures, we examined larval guts from control, TrpA1-, and Dh31- mutant backgrounds. Our new data (Supplementary Figure 8) show that the TARM T2 structures are still present in the mutants, indicating that the formation of these structures does not depend on the ROS/TrpA1/Dh31 axis.

      Reviewer #2 (Public Review):

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter, and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. The use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

      Based on our findings, we hypothesize that ROS play a crucial role in this discrimination process, with uracil release by pathogenic or opportunistic bacteria potentially serving as a key signal.

      To test whether uracil could trigger this discrimination, we conducted experiments where Lp was supplemented with uracil. However, our results show that uracil supplementation alone was not sufficient to induce the blockage response (new data: Supplementary Figure 5). This suggests that while uracil may be a factor in bacterial discrimination, it is likely not the sole trigger, and additional bacterial factors or signals may be required to activate the blockage mechanism. 

      The use of only two pathogens and one symbiotic species may not be sufficient to draw a conclusion on the difference in treatment between pathogenic and symbiotic species.

      To address this concern, we performed additional intoxication experiments using Escherichia coli OP50, a bacterium considered innocuous and commonly used as a standard food source for C. elegans in laboratory settings. The results, presented in our updated data (new data: Fig 1B), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our conclusion that the gut’s discriminatory mechanism is specific to pathogenic bacteria, and not merely based on bacterial genus.

      We can also wonder how the process of sphincter contraction is affected by the procedure used in this study, where larvae are starved. Does the sphincter contraction occur in continuous feeding conditions? Since larvae are continuously feeding, is this process physiologically relevant?

      In our intoxication protocol, the larvae are exposed to contaminated food for 1 hour, during which the blockage ratio is quantified. Since this period involves continuous feeding with the contaminated food, we do not consider the larvae starved during the quantification process. Our observations show differences in the blockage response depending on the bacterial contaminant and the genetic background of the host. Additionally, we were able to trigger the blocking phenomenon using exogenous hCGRP.

      Regarding the experimental setup for movie observations, it is true that larvae are immobilized on tape in a humid chamber, which is not a fully physiological context. However, in the new movie we provide (Movie 3), co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green) shows that both are initially blocked, followed by the posterior release of Dextran once the bacterial clearance begins.

      Furthermore, to address the question of continuous exposure, we extended the exposure period to 20 hours instead of 1 hour. Even after prolonged exposure, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This supports the physiological relevance of the sphincter contraction and its ability to function under continuous feeding conditions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We link the adult phenotype to the one we describe in larvae in order to have the candidate approach toward the ROS/TrpA1/Dh31 axis. As we already mention in the discussion, while larvae stay in the food, adult flies can go away. If larvae eject their gut content, they may ingest it within minutes. We clarify our idea in the last part of the discussion.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      Video provided with Bt-GFP 1.3 10^10 CFU/mL (new data: Movie 5). When larvae eat less, there is no blockage and bacteria can reach the posterior midgut. Note that the fluorescence is weak due to the low amount of bacteria ingested. The movie shows an excretion of the bacteria. There is also no death of the larvae. Together these results suggest that below a given threshold, the virulence of the bacteria is too weak to i) trigger a blockage and 2/ kill the larva. The bacteria are likely eliminated through classical peristaltism.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      Maybe we are below the threshold of virulence. See our response just above.

      Why is this model only applied to high-dose infections? 

      As mentioned in the manuscript, lower concentrations do not trigger the blockage and for lower concentrations with a GFP signal still detectable, wild-type animals resist the presence of live-bacteria within the posterior part of the intestine.

      About the doses, the CFU should be considered. Indeed, there are around 5.10^4 CFU per midgut. In our experimental procedure we calculate the amount of bacteria for 500 µl of contaminated medium (i.e. 4.10^10 CFU/500µl of medium). Then around 50 larvae were deposited in the 500µl of contaminated media. In this condition, one larva ingests 5.10^4 CFU. Moreover, larvae are only fed for 1h. 

      So 1/ continuous feeding may also trigger locking even at lower doses and 2/ the other mechanisms of defenses (such as ROS) or peristalsis may be sufficient to eliminate lower doses (i.e. 10^3 CFU or below). See the new movie 5 we provide with Bt-GFP 1.3 10^10 CFU/mL

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      ROS activity (bacteriostatic and bacteriolytic), IMD activation, AMP transcription, translation, secretion and bacteriostatic as well as bacteriolytic activity.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We provide new data for larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMP-encoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (Dpt-Cherry) fed with fluorescent Lp or Bt (new data: SUPP11) showing that with Bt blocked in the anterior part of the intestine, the dpt gene is mainly induced in this area. Note that in the larva infected with Lp-GFP, the Dpt-Cherry reporter is weakly expressed in the anterior midgut. In the posterior midgut, the place where Lp-GFP is established, Dpt-Cherry is barely detectable. This observation is in line with the previous observation made by Bosco-Drayon et al., (2012) demonstrating the low level of AMP expression in the posterior midgut due to the expression of the IMD negative regulators such as amidases and pirk. In the larva infected with Bt-GFP, note the obvious expression of DptCherry in the anterior midgut colocalizing with the bacteria (new data: SUPP11).

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      In ctrl animals fed Bt, Ecc and Lp we see Dpt-RFP in anterior midgut and likely in the beginning of acidic region. See the new data: SUPP11 images provided for the previous remark.

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      Same is true with Lp in wt; not evenly distributed. As if the transit time in the anterior part is very short due to peristaltism which would fit for a check point area if you’re not supposed to be blocked. Indeed, peristaltism is active during our intoxications. Then, it stays longer in the posterior part, fitting with the absorptive skills of the intestinal cells in this area. With Lp in ctrl or Ecc and Bt in TrpA1- and Dh31- mutants, there are always a few in the anterior midgut but always much less compared to the posterior. See our figure 1A and 3A.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We provide larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMPencoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (pDptCherry) fed with fluorescent Lp or Bt, (new data: SUPP11).

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      Indeed, we would like to explore the roles of these structures and the putative requirement upon bacterial intoxication using some driver lines developed by the team that studied these muscles in vivo. However, the genetic tools currently available will target TARMsT1 and T2 at the same time. See Fig 2 form Bataillé et al, . 2020. Moreover, these TARMs are, at first, crucial for the correct positioning of the gut within the larvae and their absence lead to a global food intake and transit defect that will bias the outcomes of our intoxication protocol (see fig 6 from Bataillé et al,. 2020).

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      We provide images of larval guts from ctrl, TrpA1 and Dh31 mutants demonstrating the presence of the TARMs T2 structures despite the mutations (new data: SUPP8). In addition, we provide representative movies of peristalsis in intestines of Dh31 mutants fed or not with Ecc to illustrate that muscular activity is not abolished (new data: Movie 9 and Movie 10).

      Minor points:

      (1) Why not use the Pros-Gal4/UAS-Dh31 strain in Figure 3B in addition to hCGRP?

      We opted for exogenous hCGRP addition because it allowed us precise timing control over Dh31 activation. Overexpression of Dh31 from embryogenesis or early larval stages could have significant and unintended effects on intestinal physiology, potentially confounding the results. While temporal control using TubG80ts could be an alternative, our focus was on identifying the specific cells responsible for the phenomenon.

      To achieve this, we perturbed Dh31 production via RNAi, specifically targeting a limited number of enteroendocrine cells (EECs) using the DJ752-Gal4 driver, as described by Lajeunesse et al., 2010. Our new data (Supplementary Figure 4) demonstrate that Dh31 expression in this subset of cells is indeed necessary for the blockage phenomenon.

      (2) Section title (line 287) refers to mortality, but no mortality data is in the figure.

      We agree that the title referenced mortality, whereas no mortality data was presented in this section. We have updated the title to better reflect the data discussed in this part of the manuscript.

      (3) It may be better to combine ROS-related contents in the same figure.

      While it is technically feasible to consolidate the ROS-related content into one figure, doing so would require splitting essential data, such as the Gal4 controls for the RNAi assays and parts of the survival phenotype data. We believe that the current structure of the study, which first explores the molecular aspects of the phenomenon and then demonstrates its relevance to the animal’s survival, provides a clearer and more logical flow. For these reasons, we prefer to maintain the current figure layout.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendation

      (1) Other wild-type backgrounds should be added (including the w Drosdel background of the AMP14 deficient flies) to check the robustness of the phenotype.

      To address the concern regarding the robustness of the phenotype across different wildtype backgrounds, we have tested additional genetic backgrounds, including w1, the isogenized w1118 and Oregon animals. 

      The results (new data: Figure 1C) demonstrate that Lp is able to transit freely to the posterior part of the intestine in all backgrounds, while Ecc and Bt are blocked in the anterior part. These findings confirm the robustness of the phenotype across different wildtype strains.

      (2) Although we recognize that this may be limited by the number of GFP-expressing species, other commensal and pathogenic bacteria should be tested in this assay (e.g. E. faecalis and Acetobacter).

      We performed new intoxication experiments using Escherichia coli OP50, a wellestablished innocuous bacterial strain. The data, presented in Figure 1B (new data), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our hypothesis that the blockage phenomenon is specific to pathogenic bacteria and not simply related to the bacterial genus.

      (3) It is important to test whether sphincter closure also occurs in continuous feeding conditions. This does not mean repeating all the experiments but just shows that this mechanism can take place in conditions where larvae are kept in a vial with food.

      While the movies we provide involve larvae immobilized on tape in a humid chamber, which is not a fully physiological context, we now provide new data (Movie 3) showing that, after co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green), both substances are initially blocked in the anterior midgut. Later, the dextran is released posteriorly once bacterial clearance has begun.

      Additionally, we extended the feeding period in our experiments from 1 hour to 20 hours to simulate more continuous exposure to contaminated food. Even under these prolonged conditions, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This confirms that the sphincter mechanism can function in continuous feeding conditions as well.

      (4) What are the molecular determinants discriminating innocuous from pathogenic bacteria? Addressing this point will increase the impact of the article. The fact that Relish mutants have normal valve constriction suggests that peptidoglycan recognition is not involved. Is there a sensing of pathogen virulence factors? 

      Our data suggest that uracil could be a key molecular determinant in discriminating between innocuous and pathogenic bacteria, as previously described by the W-J Lee team in several studies on adult Drosophila. However, in our experiments, exogenous uracil addition using the blue dye protocol (Keita et al., 2017) did not induce any significant changes in the larvae. Similarly, uracil supplementation in adult flies failed to trigger the Ecc expulsion and gut contraction phenotype, as reported by Benguettat et al., 2018. 

      To further investigate this, we tested the addition of uracil during Lp-GFP intoxication. In these experiments, we did not observe any blockage of Lp (new data: Supplementary Figure 5). These results suggest that uracil might not be the sole trigger for the blockage response, or we may not be providing uracil exogenously in the most effective way. Alternatively, there could be other pathogen-specific virulence factors that contribute to this discrimination mechanism.

      To address this question, the authors should infect larvae with Ecc15 evf- mutants or Ecc15 lacking uracil production. 

      Thank you for your suggestion to use Ecc15 evf- mutants or Ecc15 lacking uracil production to explore the role of uracil in bacterial discrimination. While we have provided some data using uracil supplementation (new data: Supplementary Figure 5), we agree that testing mutants like PyrE would be an important next step. Unfortunately, we currently lack access to fluorescent PyrE or Ecc15 evf- mutants.

      We are planning to address this by developing a new protocol involving fluorescent beads alongside bacteria. This approach will allow us to test several bacterial strains in parallel and better define the size threshold of the valve. However, we do not have the relevant data yet, but this will be a key focus of our future work.

      Similarly, does feeding heat-killed Ecc15 or Bt induce sequestration in the anterior midgut (larvae may be fed dextran-FITC at the same time to track bacteria)?

      Unfortunately, in our attempts to test heat-killed or ethanol-killed fluorescent Ecc15 for these experiments, we encountered an issue: while we were able to efficiently kill the bacteria, we lost the GFP signal required to track their position in the gut. This made it challenging to assess whether sequestration in the anterior midgut occurs with non-viable bacteria.

      Is uracil or Bt toxin feeding sufficient to induce valve closure? 

      As previously mentioned, uracil is a strong candidate for bacterial discrimination, and we have tested its role by adding exogenous uracil during Lp-GFP intoxication. However, in these experiments, Lp was not blocked (new data: Supplementary Figure 5). This suggests that uracil alone may not be sufficient to induce valve closure, or it may not be the only factor involved. It is also possible that our method of exogenous uracil supplementation may not be effectively mimicking the endogenous conditions.

      Regarding Bt, we used vegetative cells without Cry toxins in our experiments. Cry toxins are only produced during sporulation and are enclosed in crystals within the spore. The Bt strain we used, 4D22, has been deleted for the plasmids encoding Cry toxins. As a result, there were no Cry toxins present in the Bt-GFP vegetative cells used in our assays. This has been clarified in the Materials and Methods section of the manuscript.

      Would Bleomycin induce the same phenotype? 

      Indeed, Bleomycin, as well as paraquat, has been shown to damage the gut and trigger intestinal cell proliferation in adult Drosophila through mechanisms involving TrpA1. Testing whether Bleomycin induces a similar phenotype in larvae would indeed be interesting.

      However, one challenge we face in our intoxication protocol is that larvae tend to stop feeding when chemicals are added to their food mixture. We encountered similar difficulties in our DTT experiments, which were challenging to set up for this reason. Consequently, we aim to avoid approaches that might impair the general feeding activity of the larvae, as it can significantly affect the outcomes of our experiments.

      Could this process of sphincter closure be more related to food poisoning?

      If gut damage were the primary trigger for sphincter closure, we would indeed expect the blockage phenomenon to occur later following bacterial exposure. However, in our experiments, we observe the blockage occurring early after bacterial contact, suggesting that damage may not be the main trigger for this response.

      That said, we have not yet tested bacterial mutants lacking toxins, nor have we tested a direct damaging agent such as Bleomycin, as proposed. These would be valuable future experiments to explore the potential role of gut damage more thoroughly in this process.

      (5) Is Imd activation normal in trpA1 and DH31 mutants? The authors could use a diptericin reporter gene to check if Diptericin is affected by a lack of valve closure in trpA1.

      To address this, we performed RT-qPCR on whole larval guts from wt, TrpA11 and Dh31KG09001 genetic background. Larvae were fed with Lp, Ecc, Bt or yeast only (new data: SUPP6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the genotypes.

      Additionally, we provide imaging data from AMP reporter animals (pDpt-Cherry) in a wildtype background, fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These images also support the conclusion that Diptericin expression is not significantly affected by a lack of valve closure in trpA1 and Dh31 mutants.

      (6) Are the 2-6 DH31 positive cells the same cells described by Zaidman et al., Developmental and Comparative Immunology 36 (2012) 638-647.

      The cells identified as hemocytes in the midgut junctions by Zaidman et al. are likely the same cells we describe in our study, as they are located in the same region and are Dh31 positive. We have added a reference to this paper and included lines in the manuscript acknowledging this connection.

      Although confirming whether these cells are Hml+, Dh31+, and TrpA1+ would clarify their exact identity, this falls outside the scope of our current study. However, the possibility that these cells play a role in physical barrier immunity and also possess a hemocyte identity is indeed intriguing, and we hope future research will explore this further.

      Minor points

      (1) The mutations should be appropriately labelled with the allele name.

      This has been fixed in the main text, in Fig Legends, and in figures. 

      (2) Line 230-231: the sentence is unclear to me.

      We simplified the sentence and do not refer to the expulsion in larvae.

      (3) Discussion: although the discussion is already a bit long, it would be interesting to see if this process is likely to happen/has been described in other insects (mosquito, Bactrocera, ...).

      We reviewed the available literature but were unable to find specific examples describing the blockage phenomenon in other insects. Most studies we found focused on symbiotic bacteria rather than pathogenic or opportunistic bacteria. However, as mentioned in our manuscript, the anterior localization of opportunistic or pathogenic bacteria has been observed in Drosophila by independent research groups.

      (4) Line 546: add the Caudal Won-Jae Lee paper to state the posterior midgut is less microbicidal.

      We added the reference at the right place, mentioning as well that it concerns adults. 

      (5)  Figure 6 indicates what the cells are, shown by the arrow.

      The sentence ‘the arrows point to TARMs’ is present in the legend of Fig6.

      (6) Does the sphincter closure depend on hemocytes?

      As mentioned above, the cells we identify as TrpA1+ in the midgut junction may be the same cells described by Zaidman et al., 2012, and earlier by Lajeunesse et al., 2010. Inactivating hemocytes using the Hml-Gal4 driver may also affect these Dh31+ cells, as they share similarities with hemocytes, as pointed out by Zaidman et al. However, distinguishing between hemocytes and Dh31+/TrpA1+ cells would require a genetic intersectional approach, which is beyond the scope of our current study.

      Nevertheless, the possibility that these cells play a dual role in immunity (through blockage) and share characteristics with hemocytes while functioning as enteroendocrine cells (EECs) is quite intriguing and deserves further exploration in future studies.

    1. Author response:

      Reviewer #1 (Public review):

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”. 

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We will also add the content about other neuron types in our revised manuscript “Additionally, there is considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians. Many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system.”

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma. 

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we will include a discussion on the limitations of TLSM in reconstructing neural networks, particularly when it comes to resolving fibers within densely packed regions of the nerve tracts.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations. 

      We have removed the statement "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." We changed this statement into “These results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is not likely from the octopaminergic, GABAergic, dopaminergic and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, we would like to add the possibility that the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs. 

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution. 

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript.

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated. 

      A few major issues with the claims: 

      (4) Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015). 

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. we will revise the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We would also clarify that the primary objective of our study was not to distinguish all muscle fiber types but rather to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. We will ensure all citations are properly revised and updated in our next version.

      (5) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution. 

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We will revise the relevant sections of the manuscript to clarify this dynamic process more accurately.

      (6) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching. 

      We will remove this statement from the revised version. Instead, we will focus on describing our observations of the connections between glial cells and muscle fibers.

      (7) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons. 

      We understand that this approach is insufficient and we will revise the manuscript to more clearly state the limitations of our data. We will describe our observations as preliminary and suggest that further experiments are required.

      (8) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented. 

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 µm3. This configuration resulted in a resolution of 2×2×5 µm3 and a spatial resolution of 0.5×0.5×1.25 µm3 with 4× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 µm3. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 µm3 and a spatial resolution of 0.12×0.12×0.4 µm3 with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we will revise the manuscript to specify that the boundaries we identified are those of each nucleus, rather than entire cells. This distinction will be made clear in the revised version.

      Reviewer #3 (Public review): 

      Weaknesses: 

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing. 

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein). 

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling. We will also add the content about other neuron types in our revised version.

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript.

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this. 

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this phrasing was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We will revise this section to better describe the dynamic changes observed during regeneration.

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image. 

      Thank you for raising this important point. We will include a ground truth comparison of our automated muscle fiber counting with manual counts in the supplementary figures. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap. We are revisiting the segmentation parameters to improve the accuracy of detecting circular fibers, and we will provide an updated version of Figure 4I in the revised manuscript.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria. 

      Thank you for bringing this to our attention. What we intended to convey was the increase in neuron number during homeostasis. We will revise the abstract to avoid this mistake in this context and instead describe it as the increase in neuron numbers due to progenitor cell differentiation during homeostasis.

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us? 

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy. 

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. This combination offers several key advantages over traditional confocal microscopy. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes. We believe this distinction is significant and represents an advance over previous methods. We will clarify this point in the manuscript to better distinguish our approach from standard techniques.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the cross-sectional areas (CSA) of muscles and compare them with the changes in the proportion of type IIa fibers.

      (2) Evaluate the expression levels of Murf1 and Atrogin1 as markers of muscle atrophy using RT-PCR.

      Reviewer #2 (Public review):

      Summary

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the changes in capillary formation around other muscle fibers, except for type IIa fibers (e.g., type IIx and IIb fibers).

      (2) Evaluate the endothelial area around other muscle fibers, except for type IIa fibers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies, as is often done in research on rare tumors.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have gave the wrong impression about SiNET6 classification (it is labeled in Fig. 4a in a misleading manner). In the revised manuscript, we will correct the labeling in Fig. 4a and clarify that SiNET is not assigned to any subtype. We will further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we will note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We will clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We agree with this comment and will add the need for additional validation for this finding in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      As you can see from the assessment (which is unchanged from before) and the reviews included below, the reviewers felt that the revisions did not yet address all of the major concerns. There was agreement that the strength of evidence would be upgraded to "solid" by addressing, at minimum, the following: 

      (1) Which of the results are significant for individual monkeys; and 

      (2) How trials from different target contrasts were analyzed 

      In this revision, we have addressed the two primary editorial recommendations:

      (1) We apologize if this information was not clear in the previous version. We have updated Table 1 to highlight clearly the significant results for individual monkeys. Six of our key results – pupil diameter (Fig 2B), microsaccades (Fig 2D), decoding performance for narrow-spiking units (Fig 3A), decoding performance for broad-spiking units (Fig 3B), target-evoked firing rate for all units (Fig 3E) and target-evoked firing rate for broad-spiking units (Fig 3F) – are significant for individual animals and therefore gives us high confidence regarding our results. Please also note that we present all results for individual animals in the Supplementary figures accompanying each main figure.

      (2) We have updated the manuscript and methods to explain how trials of each contrast were included in each analysis, and how contrast normalization was performed for the analysis in Figure 3. In addition, we discuss this point in the Discussion section, which we quote below:

      “Non-target stimulus contrasts were slightly different between hits and misses (mean: 33.1% in hits, 34.0% in misses, permutation test, 𝑝 = 0.02), but the contrast of the target was higher in hits compared to misses (mean: 38.7% in hits, 27.7% in misses, permutation test, 𝑝 = 1.6   𝑒 − 31). To control for potential effects of stimulus contrast, firing rates were first normalized by contrast before performing the analyses reported in Figure 3. For all other results, we considered only non-target stimuli, which had very minor differences in contrast (<1%) across hits and misses. In fact, this minor difference was in the opposite direction of our results with mean contrast being slightly higher for misses. While we cannot completely rule out any other effects of stimulus contrast, the normalization in Figure 3 and minor differences for non-target stimuli should minimize them.”

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Nandy and colleagues examine neural, physiological and behavioral correlates of perceptual variability in monkeys performing a visual change detection task. They used a laminar probe to record from area V4 while two macaque monkeys detected a small change in stimulus orientation that occurred at a random time in one of two locations, focusing their analysis on stimulus conditions where the animal was equally likely to detect (hit) or not-detect (miss) a briefly presented orientation change (target). They discovered two behavioral and physiological measures that are significantly different between hit and miss trials - pupil size tends to be slightly larger on hits vs. misses, and monkeys are more likely to miss the target on trials in which they made a microsaccade shortly before target onset. They also examined multiple measures of neural activity across the cortical layers and found some measures that are significantly different between hits and misses. 

      Strengths: 

      Overall the study is well executed and the analyses are appropriate (though several issues still need to be addressed as discussed in Specific Comments). 

      Thank you.

      Weaknesses: 

      My main concern with this study is that, with the exception of the pre-target microsaccades, the correlates of perceptual variability (differences between hits and misses) appear to be weak, potentially unreliable and disconnected. The GLM analysis of predictive power of trial outcome based on the behavioral and neural measures is only discussed at the end of the paper. This analysis shows that some of the measures have no significant predictive power, while others cannot be examined using the GLM analysis because these measures cannot be estimated in single trials. Given these weak and disconnected effects, my overall sense is that the current results provide limited advance to our understanding of the neural basis of perceptual variability. 

      Please see our response above to item #1 of the editorial recommendation. Six of our key results are individually significant in both animals giving us high confidence about the reliability and strength of our results. 

      Regarding the reviewer’s comment about the GLM, we note (also stated in the manuscript) that among the measures that we could estimate reliably on a single trial basis, two of these – pre-target microsaccades and input-layer firing rates – were reliable signatures of stimulus perception at threshold. This analysis does not imply that the other measures – Fano Factor, PPC, inter-laminar population correlations, SSC (which are all standard tools in modern systems neuroscience, and which cannot be estimated on a single-trial basis) – are irrelevant. Our intent in including the GLM analyses was to complement the results reported from these across-trial measures (Figs 4-7) with the predictive power of single-trial measures.

      While no study is entirely complete in itself, we have attempted to synthesize our results into a conceptual model as depicted in Fig 8.

      Reviewer #2 (Public Review): 

      Strengths: 

      The experiments were well-designed and executed with meticulous control. The analyses of both behavioural and electrophysiological data align with the standards in the field. 

      Thank you.

      Weaknesses: 

      Many of the findings appear to be subtle differences and incremental compared to previous literature, including the authors' own work. While incremental findings are not necessarily a problem, the manuscript lacks clear statements about the extent to which the dataset, analysis, and findings overlap with the authors' prior research. For example, one of the main findings, which suggests that V4 neurons exhibit larger visual responses in hit trials (as shown in Fig. 3), appears to have been previously reported in their 2017 paper. 

      We respectfully disagree with the assessment that the findings reported here are incremental over the results reported in our prior study (Nandy et al,. 2017). In the previous study, we compared the laminar profile of neural modulation due to the deployment of attention i.e. the main comparison points were the attend-in and the attend-away conditions while controlling for visual stimulation. In this study, we go one step further and home in on the attend-in condition and investigate the differences in the laminar profile of neural activity (and two additional physiological measures: pupil and microsaccades) when the animal either correctly reports or fails to report a stimulus with equal probability. We thus control for both the visual stimulation and the cued attention state of the animal. While there are parallels to our previous results (as the reviewer correctly noted), the results reported here cannot be trivially predicted from our previous results. Please also note that we discuss our new results in the context of prior results, from both our group and others, in the manuscript (lines 310-332).

      Furthermore, the manuscript does not explore potentially interesting aspects of the dataset. For instance, the authors could have investigated instances where monkeys made 'false' reports, such as executing saccades towards visual stimuli when no orientation change occurred, which allows for a broader analysis that considers the perceptual component of neural activity over pure sensory responses. Overall, lacking broad interest with the current form.

      We appreciate the reviewer’s feedback on analyzing false alarm trials. Our focus for this study was to investigate the behavioral and neural correlates accompanying a correct or incorrect perception of a target stimulus presented at perceptual threshold. False alarm trials, by definition, do not include a target presentation. Moreover, false alarm rates rapidly decline with duration into a trial, with high rates during the first non-target presentation and rates close to zero by the time of the eighth presentation (see figure). Investigating false alarms will thus involve a completely different form of analysis than we have undertaken here. We therefore feel that while analyzing false alarm trials will be an interesting avenue to pursue in the future, it is outside the scope of the present study.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      New Experiments

      (1) Activation-dependent dynamics of PKA with the RIα regulatory subunit, adding to the answer to Reviewers 1 and 2. To determine the dynamics of all PKA isoforms, we have added experiments that used PKA-RIα as the regulatory subunit. We found differential translocation between PKA-C (co-expressed with PKA-RIα) and PKA-RIα (Figure 1–figure supplement 3), similar to the results when PKA-RIIα or PKA-RIβ was used.

      (2) PKA-C dynamics elicited by a low concentration of norepinephrine, addressing Reviewer 3’s comment. We have found that PKA-C (co-expressed with RIIα) exhibited similar translocation into dendritic spines in the presence of a 5x lowered concentration (2 μM) of norepinephrine, suggesting that the translocation occurs over a wide range of stimulus strengths (Figure 1-figure supplement 2).

      Reviewer #1 (Public Review):

      Summary:

      This is a short self-contained study with a straightforward and interesting message. The paper focuses on settling whether PKA activation requires dissociation of the catalytic and regulatory subunits. This debate has been ongoing for ~ 30 years, with renewed interest in the question following a publication in Science, 2017 (Smith et al.). Here, Xiong et al demonstrate that fusing the R and C subunits together (in the same way as Smith et al) prevents the proper function of PKA in neurons. This provides further support for the dissociative activation model - it is imperative that researchers have clarity on this topic since it is so fundamental to building accurate models of localised cAMP signalling in all cell types. Furthermore, their experiments highlight that C subunit dissociation into spines is essential for structural LTP, which is an interesting finding in itself. They also show that preventing C subunit dissociation reduces basal AMPA receptor currents to the same extent as knocking down the C subunit. Overall, the paper will interest both cAMP researchers and scientists interested in fundamental mechanisms of synaptic regulation.

      Strengths:

      The experiments are technically challenging and well executed. Good use of control conditions e.g untransfected controls in Figure 4.

      We thank the reviewer for their accurate summarization of the position of the study in the field and for the positive evaluation of our study.

      Weaknesses:

      The novelty is lessened given the same team has shown dissociation of the C subunit into dendritic spines from RIIbeta subunits localised to dendritic shafts before (Tillo et al., 2017). Nevertheless, the experiments with RII-C fusion proteins are novel and an important addition.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as the reviewer points out, our second part is a novel addition to the literature.

      Reviewer #2 (Public Review):

      Summary:

      PKA is a major signaling protein that has been long studied and is vital for synaptic plasticity. Here, the authors examine the mechanism of PKA activity and specifically focus on addressing the question of PKA dissociation as a major mode of its activation in dendritic spines. This would potentially allow us to determine the precise mechanisms of PKA activation and address how it maintains spatial and temporal signaling specificity.

      Strengths:

      The results convincingly show that PKA activity is governed by the subcellular localization in dendrites and spines and is mediated via subunit dissociation. The authors make use of organotypic hippocampal slice cultures, where they use pharmacology, glutamate uncaging, and electrophysiological recordings.

      Overall, the experiments and data presented are well executed. The experiments all show that at least in the case of synaptic activity, the distribution of PKA-C to dendritic spines is necessary and sufficient for PKA-mediated functional and structural plasticity.

      The authors were able to persuasively support their claim that PKA subunit dissociation is necessary for its function and localization in dendritic spines. This conclusion is important to better understand the mechanisms of PKA activity and its role in synaptic plasticity.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      While the experiments are indeed convincing and well executed, the data presented is similar to previously published work from the Zhong lab (Tillo et al., 2017, Zhong et al 2009). This reduces the novelty of the findings in terms of re-distribution of PKA subunits, which was already established. A few alternative approaches for addressing this question: targeting localization of endogenous PKA, addressing its synaptic distribution, or even impairing within intact neuronal circuits, would highly strengthen their findings. This would allow us to further substantiate the synaptic localization and re-distribution mechanism of PKA as a critical regulator of synaptic structure, function, and plasticity.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as Reviewer 1 points out, our second part is a novel addition to the literature.

      We also thank the reviewer for suggesting the experiments to examine PKA’s synaptic localization and dynamics as a key mechanism underlying synaptic structure and function. We agree that this is a very interesting topic. At the same time, we feel that this mechanistic direction is open ended at this time and beyond what we try to conclude within this manuscript: prevention of PKA dissociation in neurons affects synaptic function. Therefore, we will save the suggested direction for future studies. We hope the reviewer understand.

      Reviewer #3 (Public Review):

      Summary:

      Xiong et al. investigated the debated mechanism of PKA activation using hippocampal CA1 neurons under pharmacological and synaptic stimulations. Examining the two PKA major isoforms in these neurons, they found that a portion of PKA-C dissociates from PKA-R and translocates into dendritic spines following norepinephrine bath application. Additionally, their use of a non-dissociable form of PKC demonstrates its essential role in structural long-term potentiation (LTP) induced by two-photon glutamate uncaging, as well as in maintaining normal synaptic transmission, as verified by electrophysiology. This study presents a valuable finding on the activation-dependent re-distribution of PKA catalytic subunits in CA1 neurons, a process vital for synaptic functionality. The robust evidence provided by the authors makes this work particularly relevant for biologists seeking to understand PKA activation and its downstream effects essential for synaptic plasticity.

      Strengths:

      The study is methodologically robust, particularly in the application of two-photon imaging and electrophysiology. The experiments are well-designed with effective controls and a comprehensive analysis. The credibility of the data is further enhanced by the research team's previous works in related experiments. The conclusions of this paper are mostly well supported by data. The research fills a significant gap in our understanding of PKA activation mechanisms in synaptic functioning, presenting valuable insights backed by empirical evidence.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      The physiological relevance of the findings regarding PKA dissociation is somewhat weakened by the use of norepinephrine (10 µM) in bath applications, which might not accurately reflect physiological conditions. Furthermore, the study does not address the impact of glutamate uncaging, a well-characterized physiologically relevant stimulation, on the redistribution of PKA catalytic subunits, leaving some questions unanswered.

      We agreed with the Reviewer that testing under physiological conditions is critical especially given the current debate in the literature. That is why we tested PKA dynamics induced by the physiological stimulant, norepinephrine. It has been suggested that, near the release site, local norepinephrine concentrations can be as high as tens of micromolar (Courtney and Ford, 2014). Based on this study, we have chosen a mid-range concentration (10 μM). At the same time, in light of the Reviewer’s suggestion, we have now also tested PKA-RIIα dissociation at a 5x lower concentration of norepinephrine (2 μM; New Experiment #2). The activation and translocation of PKA-C is also readily detectible under this condition to a degree comparable to when 10 μM norepinephrine was used.

      Regarding the suggested glutamate uncaging experiment, it is extremely challenging because of finite signal-to-noise ratios in our experiments. From our past studies, we know that activated PKA-C can diffuse three dimensionally, with a fraction as membrane-associated proteins and the other as cytosolic proteins. Although we have evidence that its membrane affinity allows it to become enriched in dendritic spines, it is not known (and is unlikely) that activated PKA-C is selectively targeted to a particular spine. Glutamate uncaging of a single spine presumably would locally activate a small number of PKA-C. It will be very difficult to trace the 3D diffusion of these small number of molecules in the presence of surrounding resting-state PKA-C molecules. Finally, we hope the reviewer agrees that, regardless of the result of the glutamate uncaging experiment, the above new experiment (New Experiment #2) already indicate that certain physiologically relevant stimuli can drive PKA-C dissociation from PKA-R and translocation to spines, supporting our conclusion.

      Reviewer #2 (Recommendations For The Authors):

      It was a pleasure reading your paper, and the results are well-executed and well-presented.

      My main and only recommendations are two ways to further expand the scope of the findings.

      First, I believe addressing the endogenous localization of PKA-C subunit before and after PKA activation would be highly important to validate these claims. Overexpression of tagged proteins often shows vastly different subcellular distribution than their endogenous counterparts. Recent technological advances with CRISPR/Cas9 gene editing (Suzuki et al Nature 2016 and Gao et al Neuron 2019 for example) which the Zhong lab recently contributed to (Zhong et al 2021 eLife) allow us to tag endogenous proteins and image them in fixed or live neurons. Any experiments targeting endogenous PKA subunits that support dissociation and synaptic localization following activation would be very informative and greatly increase the novelty and impact of their findings.

      We agreed that addressing the endogenous PKA dynamics is important. However, despite recent progress, endogenous labeling using CRISPR-based methods remains challenging and requires extensive optimization. This is especially true for signaling proteins whose endogenous abundance is often low. We have tried to label PKA catalytic subunits and regulatory subunits using both the homologous recombination-based method SLENDR and our own non-homologous end joining-based method CRISPIE. We did not succeed, in part because it is very difficult to see any signal under wide-field fluorescence conditions, which makes it difficult to screen different constructs for optimizing parameters. It is also possible that, at the endogenous abundance, the label is just not bright enough to be seen. Nevertheless, for both PKA type Iβ and type IIα that we studied in this manuscript, we have correlated the measured parameters (specifically, Spine Enrichment Index or SEI) with the overexpression level (Figure 1-figure supplement 1). We found that they are not strongly correlated with the expression level under our conditions. By extrapolating to non-overexpression conditions, our conclusion remains valid.

      To overcome the inability to label endogenous PKA subunits using CRISPR-based methods, we have also attempted a conditional knock-in method call ENABLED that we previously developed to label PKA-Cα. In preliminary results, we found that endogenously label PKA were very dim. However, in a subset of cells that are bright enough to be quantified, the PKA catalytic subunit indeed translocated to dendritic spines upon stimulation (see Additional Fig. 1 in the next page), corroborating our results using overexpression. These results, however, are not ready to be published because characterization of the mouse line takes time and, at this moment, the signal-to-noise ratio remains low. We hope that the reviewer can understand.

      Author response image 1.

      Endogeneous PKA-Cα translocate to dendritic spines upon activation.

      Second, experiments which would advance and validate these findings in vivo would be highly valuable. This could be achieved in a number of ways - one would be overexpression of tagged PKA versions and examining sub-cellular distribution before and after physiological activation in vivo. Another possibility is in vivo perturbation - one would speculate that disruption or tethering of PKA subunits to the dendrite would lead to cell-specific functional and structural impairments. This could be achieved in a similar manner to the in vitro experiments, with a PKA KO and replacement strategy of the tethered C-R plasmid, followed by structural or functional examination of neurons.

      I would like to state that these experiments are not essential in my opinion, but any improvements in one of these directions would greatly improve and extend the impact and findings of this paper.

      We thank the reviewer for the suggestion and the understanding. The suggested in vivo experiments are fascinating. However, in vivo imaging of dendritic spine morphology is already in itself challenging. The difficulty greatly increases when trying to detect partial, likely transient translocation of a signaling protein. It is also very difficult to knock down endogenous PKA while simultaneously expressing the R-C construct in a large number of cells to achieve detectable circuit or behavioral effect (and hope that compensation does not happen over weeks). We hope the reviewer agrees that these experiments would be their own project and go beyond the time and scope of the current study.

      Reviewer #3 (Recommendations For The Authors):

      Please elaborate on the methods used to visualize PKA-RIIα and PKA-RIβ subunits.

      As suggested, we have now included additional details for visualizing PKA-Rs in the text. Specifically, we write (pg. 5): “…, as visualized using expressed PKA-R-mEGFP in separate experiments (Figs. 1A-1C).”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete. 

      Strengths: 

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic. 

      Weaknesses: 

      (1) The simulations performed are not sufficiently long (Figure 2A) to accurately comment on phase separation behavior. The simulations do not appear to have converged well, indicating that the system has not reached a steady state, rendering the analysis of the trajectories unreliable.

      We have extended the simulations for an additional 500 ns, to 1500 ns. The last 500 ns show reasonably good convergence (see Figure 2A).

      (2) The majority of the data presented shows no significant alteration with changes in salt concentration. However, the authors have based conclusions and made significant comments regarding salt activities. The absence of error bars in the data representation raises questions about its reliability. Additionally, the manuscript lacks sufficient scientific details of the calculations.  

      We have now included error bars. With the error bars, the salt dependences of all the calculated properties (exception for Rg) show a clear trend. Additionally, we have expanded the descriptions of our calculations (p. 15-16).

      (3) In Figures 2B and 2C, the changes in the radius of gyration and the number of contacts do not display significant variations with changes in salt concentration. The change in the radius of gyration with salt concentration is less than 1 Å, and the number of contacts does not change by at least 1. The authors' conclusions based on these minor changes seem unfounded. 

      The variation of ~ 1 Å for the calculated Rg is similar to the counterpart for the experimental Rg. As for the number of contacts, note that this property is presented on a per-residue basis, so a value of 1 means that each residue picks up one additional contact, or each protein chain gains a total of 131 contacts, when the salt concentration is increased from 50 to 1000 mM.

      Reviewer #2 (Public Review): 

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: 

      (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context.

      (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. 

      (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.). Accordingly, this manuscript should be revised to address the following. In particular, the discussion in the manuscript should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      (1) The ability to use atomistic models to address the questions at hand is a strength of the present work. However, presumably because of the computational cost of such models, the "phase-separated" "condensates" in this manuscript are extremely small (only 8 chains). An inspection of Fig.1 indicates that while the high-salt configuration (snapshot, bottom right) is more compact and droplet-like than the low-salt configuration (top right), it is not clear that the 50 mM NaCl configuration can reasonably correspond to a dilute or homogeneous phase (without phase separation) or just a condensate with a lower protein concentration because the chains are still highly associated. One may argue that they become two droplets touching each other (the chains are not fully dispersed throughout the simulation box, unlike in typical coarse-grained simulations of biomolecular phase separation). While it may not be unfair to argue from this observation that the condensed phase is less stable at low salt, this raises critical questions about the adequacy of the approach as a stand-alone source of theoretical information. Accordingly, an informative discussion of the limitation of the authors' approach and comparisons with results from complementary approaches such as analytical theories and coarsegrained molecular dynamics will be instructive-even imperative, especially since such results exist in the literature (please see below). 

      We now discuss the limitations of our all-atom simulations and also other approaches (p. 13; see below).

      (2) The aforementioned limitation is reflected by the authors' choice of using Dmax as a sort of phase separation order parameter. However, no evidence was shown to indicate that Dmax exhibits a twostate-like distribution expected of phase separation. It is also not clear whether a Dmax value corresponding to the linear dimension of the simulation box was ever encountered in the authors' simulated trajectories such that the chains can be reliably considered to be essentially fully dispersed as would be expected for the dilute phase. Moreover, as the authors have noted in the second paragraph of the Results, the variation of Dmax with simulation time does not show a monotonic rank order with salt concentration. The authors' explanation is equivalent to stipulating that the simulation system has not fully equilibrated, inevitably casting doubt on at least some of the conclusions drawn from the simulation data. 

      First off, with the extended simulations, the Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). Secondly, as we now state (p. 13), our low-salt simulations mimic a homogenous solution whereas our high-salt simulations mimic the dense phase of a phase-separated system. The intermediate-salt simulations also mimic the dense phase but at a somewhat lower concentration (hence the intermediate Dmax value).

      (3) With these limitations, is it realistic to estimate possible differences in salt concentration between the dilute and condensed phases in the present work? These features, including tie-lines, were shown to be amenable to analytical theory and coarse-grained molecular dynamics simulation (please see below).  

      The differences in salt effects that we report do not represent those between two phases. Rather, as explained in the preceding reply, they represent differences between a homogenous solution at low salt and the dense phase at higher salt. We also acknowledge salt effects calculated by analytical theory and coarse-grained simulations (p. 13).

      (4) In the comparison in Fig.2B between experimental and simulated radius of gyration as a function of [NaCl], there is an outlier among the simulated radii of gyration at [NaCl] ~ 250 mM. An explanation should be offered.  

      After extending the simulations and analyzing the last 500 ns, the Rg data no longer show an outlier though still have some fluctuations from one salt concentration to another.

      (5) The phenomenon of no phase separation at zero and low salt and phase separation at higher salt has been observed for the IDP Caprin1 and several of its mutants [Wong et al., J Am Chem Soc 142, 24712489 (2020) [https://pubs.acs.org/doi/full/10.1021/jacs.9b12208], see especially Fig.9 of this reference]. This work should be included in the discussion and added to Table 1. 

      We now have added Caprin1 to Table 1 (new ref 26) and discuss this paper (p. 13).

      (6) The authors stated in the Introduction that "A unifying understanding of how salt affects the phase separation of IDPs is still lacking". While it is definitely true that much remains to be learned about salt effects on IDP phase separation, the advances that have already been made regarding salt effects on IDP phase separation is more abundant than that conveyed by this narrative. For instance, an analytical theory termed rG-RPA was put forth in 2020 to provide a uniform (unified) treatment of salt, pH, and sequence-charge-pattern effects on polyampholytes and polyelectrolytes (corresponding to the authors' low net charge and high net charge cases). This theory offers a means to predict salt-IDP tie-lines and a comprehensive account of salt effect on polyelectrolytes resulting in a lack of phase separation at extremely low salt and subsequent salt-enhanced phase separation (similar to the case the authors studied here) and in some cases re-entrant phase separation or dissolution [Lin et al., J Chem Phys 152. 045102 (2020) [https://doi.org/10.1063/1.5139661]]. This work is highly relevant and it already provided a conceptual framework for the authors' atomistic results and subsequent discussion. As such, it should definitely be a part of the authors' discussion. 

      We now cite this paper (new ref 34) in Introduction (p. 4). We also discuss its results for Caprin1 (new ref 18; p. 13).

      (7) Bridging interactions by small ions resulting in effective attractive interactions among polyelectrolytes leading to their phase separation have been demonstrated computationally by Orkoulas et al., Phys Rev Lett 90, 048303 (2003) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.90.048303]. This result should also be included in the discussion. 

      We now cite this paper (new ref 41; p. 11).

      (8) More recently, the salt-dependent phase separations of Caprin1, its RtoK variants and phosphorylated variant (see item #5 above) were modeled (and rationalized) quite comprehensively using rG-RPA, field-theoretic simulation, and coarse-grained molecular dynamics [Lin et al., arXiv:2401.04873 [https://arxiv.org/abs/2401.04873]], providing additional data supporting a conceptual perspective put forth in Lin et al. J Chem Phys 2020 (e.g., salt-IDP tie-lines, bridging interactions, reentrance behaviors etc.) as well as in the authors' current manuscript. It will be very helpful to the readers of eLife to include this preprint in the authors' discussion, perhaps as per the authors' discretion along the manner in which other preprints are referenced and discussed in the current version of the manuscript. 

      We now cite this paper (new ref 18) and discuss it along with new ref 26 in Discussion (p. 13).

      Reviewer #3 (Public Review): 

      Summary: 

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation. 

      Strengths: 

      Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior. 

      Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions. 

      Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation. 

      Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience. 

      Weaknesses: 

      Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. A comparative analysis would strengthen the manuscript's conclusions and broaden its impact.

      Perhaps we did not impress on the reviewer how expensive the all-atom MD simulations on A1-LCD were: the systems each contained half a million atoms and the simulations took many months to complete. That said, we agree with the reviewer that, ideally, a comparative study on a protein showing the typical screening class of salt dependence would have made our work more complete. However, we are confident of the conclusions for several reasons. First, the three salt effects – charge neutralization, bridging, and strengthening of pi-types of interactions – revealed by the all-atom simulations are physically sound and well-supported by other studies. Second, these effects led us to develop a unified picture for the salt dependence of homotypic phase separation, in the form of a predictor for the classes of salt dependence based on amino-acid composition. This predictor works well for nearly 30 proteins. Third, recent studies using analytical theory and coarse-grained simulations (new ref 18) also strongly support our conclusions.

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figure 1, the color scheme should be updated and the figure remade, as the current set of color choices makes it very difficult to distinguish the magenta spheres.  

      We have increased the sizes of ions in Figure 1 to make them distinguishable.

      (2) Within the framework of atomistic simulations, the influence of salt concentration alteration on protein conformational plasticity is worth investigating. This could be correlated (with proper details) with the effect of salt-concentration-modulated protein aggregation behavior. 

      We now use RMSF to measure conformational plasticity, which shows a clear salt-dependent trend with a 27% reduction in fluctuations from 50 mM to 1000 mM NaCl (new Fig. S1).

      (3) The authors should mention the protein concentrations employed in the simulations and whether these are consistent with experimentally used concentrations.  

      We have mentioned the initial concentration (3.5 mM). We now further state that this concentration is maintained in the low-salt simulations, indicating absence of phase separation, but is increased to 23 mM in the high-salt simulations, indicating phase separation. The latter value is consistent with the measured concentrations in the dense phase (last two paragraphs of p. 5).

      (4) It would be useful to test the salt effect for at least two extreme salt concentrations at various protein concentrations, consistent with experimental protein concentration ranges.  

      In simulation studies of short peptides (ref 37), we have shown that the initial concentration does not affect the final concentration in the dense phase, as expected for phase-separation systems. We expect that the same will be true for the A1-LCD system at intermediate and high salt where phase separation occurs. Though this expectation could be tested by simulations at a different initial protein concentration, such simulations would be expensive but unlikely to yield new physical insight.

      (5) Importantly, the simulations do not appear to have converged well enough (Figure 2A). The authors should extend the simulation trajectories to ensure the system has reached a steady state.  

      We extended the simulations for an additional 500 ns, which now appear to show convergence. In Figure 2A we now see Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). 

      (6) The authors mention "phase separation" in the title, but with only a 1 μs simulation trajectory, it is not possible to simulate a phenomenon like phase separation accurately. Since atomistic simulations cannot realistically capture phase separation on this timescale, a coarse-grained approach is more suitable. To properly explore salt effects in the context of phase separation, long timescale simulation trajectories should be considered. Otherwise, the data remain unreliable. 

      Our all-atom simulations revealed rich salt effects that might have been missed in coarse-grained simulations. It is true that coarse-grained models allow the simulations of the phase separation process, but as we have recently demonstrated (refs 36 and 37), all-atom simulations on the μs timescale are also able to capture the spontaneous phase separation of peptides and small IDPs. A1-LCD is much larger than those systems, so we had to use a relatively small chain number (8 chains here vs 64 used in ref 37 and 16 used in ref 37). S2ll, we observe the condensation into a dense phase at high salt. We discuss the pros and cons of all-atom vs. coarse-grained simulations in p. 13.

      (7) In Figure 5E, the plot does not show that g(r) has reached 1. If it does, the authors should show the full curve. The same issue remains with supplementary figures 1, 2, 3, etc.  

      We now show the approach to 1 in the insets of Figs. S2, S3, S4, and 5E.

      (8) None of the data is represented with error bars. The authors should include error bars in their data representations. 

      We have now included error bars in all graphs that report average values.

      (9) The authors state that "the net charge of the system reduces to only +8 at 1000 mM NaCl (Figure 3C)" but do not explain how this was calculated. 

      We now add this explanation in methods (p. 16).

      (10). The authors mention "similar to the role played by ATP molecules in driving phase separation of positively charged IDPs." However, ATP can inhibit aggregation, and its induction of phase separation is concentration-dependent. Given ATP's large aromatic moiety, its comparison to ions is not straightforward and is more complex. This comparison can be at best avoided. 

      In this context we are comparing the bridging capability of ATP molecules in driving phase separation of positively charged IDPs in ref 36 to the bridging capability of the ions here. In ref 36 the authors show ATP bridging interactions between protein chains similar to what we show here with ions.

      (11) Many calculations are vaguely represented. The process for calculating the number of bridging ions, for example, is not well documented. The authors should provide sufficient details to allow for the reproducibility of the data. 

      We have now expanded the methods section to include more detailed information on calculations done.

      Reviewer #3 (Recommendations For The Authors): 

      Include error bars or standard deviations for all results averaged over four replicates, particularly for the number of ions and contacts per residue. This would provide a clearer picture of the data's reliability and variability. 

      We have now included error bars in all graphs that report averaged values.

      Strengthen the support for the conclusion that "each Arg sidechain often coordinates two Cl- ions, multiple backbone carbonyls often coordinate a single Na+ ion." While Fig. 3A clearly demonstrates ArgCl- coordination, the Na+ coordination claim for a 131-residue protein requires further clarification. Consider including the integration profile of radial distribution functions for Na+ ions to bolster this assertion. 

      We now report the number of Na+ ions that coordinate with multiple backbone carbonyls (p. 7) as well as the number of Na+ ions that bridge between A1-LCD chains via coordination with multiple backbone carbonyls (p. 9). Please note that Figure 4A right panel displays an example of Na+ coordinating with multiple backbone carbonyls.

      Address the following typographical errors in the main text: o Page 11, line 25: "distinct classes of sat dependence" should be "distinct classes of salt dependence" o Page 14, line 9: "for Cl- and 3.0 and 5.4 A" should be "for Cl- and 3.0 and 5.4 √Ö" o Page 14, line 18: "As a control, PRDFs for water were also calculated" should be "As a control, RDFs for water were also calculated" (assuming PRDF was meant to be RDF) 

      We have now corrected these typos.

      Consider expanding the study to include simulations of the full-length protein to provide a more comprehensive comparison between the truncated A1-LCD and the complete protein's behavior in various salt concentrations. 

      As we explained above, even with eight chains of A1-LCD, which has 131 residues, the systems already contain half a million atoms each and the all-atom simulations took many months to complete. Full-length A1 has 314 residues so a multi-chain system would be too large to be feasible for all-atom simulations.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      We agree that a multiplexed Qlinker approach would be very useful. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      We agree that multiplexed Qlinkers would open the door to exciting avenues of investigation such as studying conformational state populations.  We plan to conduct the suggested experiments when multiplexed Qlinkers are available.

      Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      As the goal of the experiments is to maximize identification of crosslinked peptides which tend to have higher charge states, we targeted ions with charge states of 3+ or higher in our MS acquisition settings for CLMS, and ignored ions with 2+ charge states, which correspond to many of the normal (i.e., not crosslinked) peptides that are identified by MS. As a result, normal peptides are less likely to be identified by the MS procedure used in our CLMS experiments compared to MS settings typically used to identify normal peptides. Our settings may also fail to identify some mono-modified peptides. Like most other CLMS methods, the total number of identified crosslinked peptide spectra is usually less than 1% of the total acquired spectra and we normally expect the crosslinked species to be approximately 1% of the total peptides. 

      We added information about the number of crosslinked and monolinked peptides identified in the pol I benchmarking experiments (line 173).  The number of crosslinks and monolinks identified in the pol II +/- a-amanitin experiment, the TBP/TFIIA/TFIIB experiment and the pol II experiment +/- Rpb4/7 are also provided.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the pol II experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      Regarding the Pol II complex experiment described in Figures 4 and 5, out of the 277 lysine residues in the complex, 207 were identified as monolinked residues (74.7%), and 817 crosslinked pairs out of 38,226 potential pairs (2.1%) were observed. The ability of CLMS to detect proximity/reactivity changes may be impacted by several factors including 1) the (low) abundance of crosslinked peptides in complex mixtures, 2) the presence of crosslinkable residues in close proximity with appropriate orientation, and 3) the ability to generate crosslinked peptides by enzymatic digestion that are amenable to MS analysis (i.e., the peptides have appropriate m/z’s and charge states, the peptides ionize well, the peptides produce sufficient fragment ions during MS2 analysis to allow confident identification). Future efforts to enrich crosslinked peptides prior to MS analysis may improve sensitivity.

      It is very difficult to estimate the modification efficiency of Qlinker (or many other crosslinkers) based on peptide identification results. One major reason for this is that trypsin is not able to cleave after a crosslinker-modified lysine residue.  As a result, the peptides generated after the modification reaction have different lengths, compositions, charge states, and ionization efficiencies compared to unmodified peptides. These differences make it very difficult to estimate the modification efficiencies based on the presence/absence of certain peptide ions, and/or the intensities of the modified and unmodified versions of a peptide. Also, 2+ ions which correspond to many normal (i.e., unmodified) peptides were excluded by our MS acquisition settings.

      It is also very difficult to predict which structural changes are expected and which crosslinked peptides and/or modified peptides can be observed by MS.  This is especially true when the experiment involves proteins containing unstructured regions such as the experiments involving Pol II, and TBP, TFIIA and TFIIB. Since we are at the early stages of using qCLMS to study structural changes, we are not sure which changes we can expect to observe by qCLMS. Additional applications of Qlinker-CLMS are needed to better understand the types of structural changes that can be studied using the approach.

      We hope that our discussions of some the limitations of CLMS for detecting conformational/reactivity changes provide the reader with an understanding of the sensitivity that can be expected with the approach.  At the end of the paragraph about the pol II a-amanitin experiment we say, “Unfortunately, no Q2linker-modified peptides were identified near the site where α-amanitin binds. This experiment also highlights one of the limitations of residue-specific, quantitative CLMS methods in general. Reactive residues must be available near the region of interest, and the modified peptides must be identifiable by mass spectrometry.” In the section about Rbp4/7-induced structural changes in pol II we describe the under-sampling issue. And in the last paragraph we reiterate these limitations and say, “This implies that this strategy, like all MS-based strategies, can only be used for interpretation of positively identified crosslinks or monolinks. Sensitivity and under sampling are common problems for MS analysis of complex samples.”

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

      We agree with the reviewer that it will be very helpful to establish gold standard datasets for CLMS. As we further develop and promote this technology, we will try to establish a standardized qCLMS.

      Reviewer #1 (Recommendations for the authors):

      Only a very minor point:

      I may have missed it but it's not really clear how many independent experiments were used for the benchmarking quantitation and mixing experiments for Figure 1. What is the reproducibility across experiments on average and on a per-peptide basis?

      Otherwise, I think the approach would really benefit from at least "Q5linkers" or even "Q10linkers", if possible. And then conduct detailed quantitative studies, either using dilution series or maybe investigating the kinetics of complex formation.

      We used a sample of BSA crosslinked peptides to optimize the MS settings, establish the MS acquisition strategies and test the quantification schemes.  The data in Figure 1 is based on one experiment, in which used ~150 ug of purified pol I complexes from a 6 L culture. We added this information to the Figure 1 legend. We also provide information about the reproducibility of peptide quantification by plotting the observed and expected ratios for each monolinked and crosslinked peptide identified in all of the runs in Figure S3.

      We agree with the reviewer that the Qlinker approach would be even more attractive if multiplex Qlinker reagents were designed. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Reviewer #2 (Recommendations for the authors):

      In addition to the public review I have the following recommendations/questions:

      (1) The first part of the results section where the synthesis of the crosslinker is explained is excellent for mass spec specialists, but problematic for general readers - either more info should be provided (e.g. b1+ ions - most readers will have no idea why that is) - or potentially it could be simplified here and the details shifted to Materials and Methods for the expert reader. The same is true below for the length of spacer arms.

      However - in general this level of detail is great - but can impact the ease of understanding for the more mass spec affine but not expert reader.

      We have added the following sentence to assist the general reader: A b1+ ion is an ion with a charge state of +1 corresponding to the first N-terminal amino acid residue after breakage of the first peptide bond (lines 126-128).

      (2) The Calmodulin experiment (lines 239 to 257) - it is a very nice result that they see the change in the crosslinked peptide between residues K78-K95, but the monolinks are not just detected as described in the text but actually go 2 fold up. This would have been actually a bit expected if the residues are now too far away to be still crosslinked that the monolinks increase. In this case, this counteraction of monolinks to crosslinked sites can also be potentially used as a "selection criteria" for interesting sites that change. Is that a possible interpretation or do the authors think that upregulation of the monolinks is a coincidence and should not be interpreted?

      We agree with the reviewer that both monolinks and crosslinks can be used as potential indicators for some changes. However, it is much more difficult to interpret the abundance information from monolinks because, unlike crosslinks, there is little associated structural/proximity information with monolinks. Because it is difficult to understand the reason(s) for changes in monolink abundance, we concentrate on changes in crosslink abundances, which provide proximity/structural information about the crosslinked residues.

      (3) Lines 267 to 274: a small thing but the structural information provided is quite dense I have to say. Maybe simplify or accompany with some supplemental figures?

      We agree that the structural information is a bit dense especially for readers who are not familiar with the pol II system.  We added a reference to Figure 3c (line 177) to help the reader follow the structural information. 

      As qCLMS is still a relatively new approach for studying conformational changes, the utility of the approach for studying different types of conformational changes is still unclear. Thus, one of the goals of the experiments is to demonstrate the types of conformational changes that can be detected by Q2linkers.  We hope that the detailed descriptions will help structural biologists understand the types of conformational changes that can be detected using Qlinkers.

      (4) Line 280: explain maybe why the sample was fractionated by SCX (I guess to separate the different complexes?).

      SCX was used to reduce the complexity of the peptide mixtures. As the samples are complex and crosslinked peptides are of low abundance compared to normal peptides, SCX can separate the peptides based on their positive charges.  Larger peptides and peptides with higher charge states, such as crosslinked peptides, tend to elute at higher salt concentration during SCX chromatography.  The use of SCX to fractionate complex peptide mixtures is described in the “General crosslinking protocol and workflow optimization” section of the Methods, and we added a sentence to explain why the sample was fractionated by SCX (lines 278-279).

      (5) Lines 354 to 357: "This suggests that the inability to identity most of these crosslinked peptides in both experiments is mainly due to under-sampling during mass spectrometry analysis of the complex samples, rather than the absence of the crosslinked peptides in one of the experiments."

      This is an extremely important point for the interpretation of missing values - have the authors tried to also collect the mass spec data with DIA which is better in recovery of the same peptide signals between different samples? I realize that these are isobaric samples so DIA measurements per se are not useful as the quantification is done on the reporter channels in the MS2, but it would at least give a better idea if the missing signals were simply not picked up for MS2 as claimed by the authors or the modified peptides are just not present. Another possibility is for the authors to at least try to use a "match between the run" function as can be done in Maxquant. One of the strengths of the method is that it is quantitative and two states are analyzed together, but as can be seen in this experiment, more than two states might want to be compared. In such cases, the under-sampling issue (if that is indeed the cause) makes interpretation of many sites hard (due to missing values) and it would be interesting if for example, an analysis approach with a "match between the runs" function could recover some of the missing values.

      We agree that undersampling/missing values is an important issue that needs to be addressed more thoroughly. This also highlights the importance of qCLMS, as conclusions about structural changes based on the presence/absence of certain crosslinked species in database search results may be misleading if the absence of a species is due to under-sampling. We have not tried to collect the data with DIA since we would lose the quantitative information. It would be interesting to see if match between runs can recover some of the missing values. While this could provide evidence to support the under-sampling hypothesis, it would not recover the quantitative information.

      We recommend performing label swap experiments and focusing downstream analysis on the crosslinks/monolinks that are identified on both experiments. Future development of multiplexed Qlinker reagents should help to alleviate under-sampling issues. See response to Reviewer #1.

      (6) Lines 375 to 393 (the whole paragraph): extremely detailed and not easy to follow. Is that level of detail necessary to drive home that point or could it be visualized in enough detail to help follow the text?

      We agree that the paragraph is quite detailed, but we feel that the level of detailed is necessary to describe the types of conformational changes that can be detected by the quantitative crosslinking data, and also illustrate the challenges of interpreting the structural basis for some crosslink abundance changes even when high resolution structural data exists.

      To make it easier to follow, we added a sentence to the legend of Figure 5b. “In the holo-pol II structure (right), Switch 5 bending pulls Rpb1:D1442 away from K15, breaking the salt bridge that is formed in the core pol II structure (left). The increase in the abundances of the Rpb1:15-Rpb6:76 and Rpb1:15-Rpb6:72 crosslinks in holo-pol II is likely attributed to the salt bridge between K15 and D1442 in core pol II which impedes the NHS ester-based reaction between the epsilon amino group of K15 and the crosslinker.”

      (7) Final paragraph in the results section - lines 397 and 398: "All of the intralinks involving Rpb4 are more abundant in holo-pol II as expected." If I understand that experiment correctly the intralinks with Rpb4 should not be present at all as Rpb4 has been deleted. Is that due to interference between the 126 and 127 channels in MS2? If so, then this also sets a bit of the upper limit of quantitative differences that can be seen. The authors should at least comment on that "limitation".

      Yes, we shouldn’t detect any Rpb4 peptides in the sample derived from the Rpb4 knockout strain. The signal from Rpb4 peptides in the DRpb4 sample is likely due to co-eluting ions. To clarify, we changed the text to:

      All of the intralinks involving Rpb4 are more abundant in the holo-pol II sample (even though we don’t expect any reporter ion signal from Rpb4 peptides derived from the ∆Rpb4 pol II sample, we still observed reporter ion signals from the channel corresponding to the DRpb4 sample, potentially due to the presence of low abundance, co-eluting ions)(lines 395-399).

      (8) Materials and Methods - line 690: I am probably missing something but why were two different mass additions to lysine added to the search (I would have expected only one for the crosslinker)?

      The 297 Da modification is for monolinked peptides with one end of the crosslinker hydrolyzed and 18 Da water molecule is added. The 279 Da modification is for crosslinks and sometimes for looplinks (crosslinks involving two lysine residues on the same tryptic peptide).

    1. Author response:

      Review #1:

      Also, they observed no difference in the binding free energy of phosphatidylserine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We directly note this contrast with experimental findings in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss of function in the R47H variant extends beyond decreased binding affinities and also impacts binding patterns. As stated in our manuscript: ‘Our observations for both sTREM2 and TREM2 indicate that R47H-induced dysfunction may result not only from diminished ligand binding but also an impaired ability to discriminate between different ligands in the brain, proposing a novel mechanism for loss-of-function.’

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      The reviewer raises an interesting point regarding the repetition of individual simulations, a consideration we carefully evaluated during the design of this study. However, we believe our approach—running multiple independent models of the same system—offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      In our study, we demonstrate that within the 150 ns timescale of our protein/ligand (PL) simulations, the relatively small ligands are able to move from their initial docking positions to a specific binding site. While ideally, replicates of these independent models would further strengthen the findings, this was not computationally feasible given the unprecedented total duration of our simulations. Importantly, our conclusions are seldom based on the results of a single protein/PL simulation.

      Moreover, the ergodic hypothesis suggests that over sufficiently long timescales, simulations will explore all accessible states. Additionally, we have performed several replicate simulations of our WT and R47H Ig-like domain models in solution, specifically to investigate CDR2 loop dynamics.

      In this case, since the system involves only the protein and lacks the independent replicates seen in the protein/PL simulations, these runs were chosen to effectively capture the stochastic nature of CDR2 loop movement.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation. While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. We are currently preparing two separate publications that will delve into these gaps in more detail, as addressing them was beyond the scope of the present study.

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      We are currently working to address this comment to strengthen the validity of our results and statistical conclusions in the revised manuscript.  

      Review #2:

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We will adjust and refocus how we reference this evidence from Kober et al. in our revised manuscript. 

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper. Hence, we are currently working toward a manuscript that will be the first biologically relevant model of TREM2 in a membrane and will challenge the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action. - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.

      - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

      Weaknesses

      - Inclusion of statistical analysis is missing in several places in the text. - Functional analysis beyond coregulator binding is needed.

      We added additional statistical analyses as recommended (Source Data 1, a Microsoft Excel spreadsheet).

      Related to functional analysis, we cite and studies from our previous publication (Hughes et al. Nature Communications 2014 5:3571) where we demonstrated that the covalent inhibitor ligands (GW9662 and T0070907) do not block the activity of other ligands using a PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes. Our study here expands on this finding and other published studies showing the structural mechanism for the lack of blocking activity by the covalent inhibitors.

      Reviewer #2 (Public Review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence presented is strong and convincing.

      Weaknesses:

      However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.

      This is an interesting point, which we now expand upon in a new (third) paragraph of the discussion in our revised manuscript:

      “In our previous study, we observed synthetic and natural/endogenous ligand co-binding via co-crystallography where preformed crystals of PPARγ LBD bound to unsaturated fatty acids (UFAs) were soaked with a synthetic ligand, which pushed the bound UFA to an alternate site within the orthosteric ligand-binding pocket 8. In the scenario of synthetic ligand cobinding with a covalent inhibitor, it is possible that soaking a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand may prove to be difficult. The covalent inhibitor would need to flow through solvent channels within the crystal lattice, which may not be a problem. However, upon reaching the entrance surface to the orthosteric ligand-binding pocket, it may be difficult for the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification as the larger non-covalent ligand could block access. This potential order of addition problem may not be a problem for studies in solution or in cells, where the non-covalent ligand can more freely exchange in and out of the orthosteric pocket and over time the covalent reaction would reach full occupancy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - IC50 or EC50 values are not reported for the coregulator interaction assays, R2 for fit should also be reported where Ki and IC50s are disclosed.

      We now report fitting statistics and IC50/EC50 values when possible in Figure 2B and Source Data 1 along with R2 values for the fit. We note that some data do not show complete or robust enough binding curves to faithfully fit to a dose response equation.

      -  Reporter gene or qPCR should be performed for the combinations of covalent and noncovalent ligands to show how these molecules impact transcriptional activities rather than just coregulator binding profiles.

      We previously performed PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes to demonstrate that cotreatment of a covalent inhibitor (GW9662 or T0070907) with a non-covalent ligand does not block activity of the non-covalent ligand and showed cobinding-induced activation relative to DMSO control (Hughes et al., 2024 Nature Communications). We did not specifically mention this in our original manuscript, but we now call this out in the first paragraph of the results section.

      - Inclusion of a structure figure to show the different helix 12 orientations should be included in the introduction. Likewise, how the overall structure of the LBD changes as a result of the cobinding in the discussion or a summary model would be helpful.

      Our revised manuscript includes a structure figure called out in the introduction describing the active and repressive helix 12 PPARγ LBD conformations (new Figure 1). There are no major changes to the overall structure of the LBD compared to the active conformation that crystallized, so we did not include a summary model figure but we do refer readers to our previous paper (Shang and Kojetin, Structure 2021 29(9):940-950) in the penultimate paragraph of the discussion. We also added the following sentence to the crystallography results section related to the overall LBD changes:

      “The structures show high structural similarity to the transcriptionally active LBD conformation with rmsd values ranging from 0.77–1.03Å (Supplementary Table S2)”

      A typo in paragraph 3 of the discussion says "long-live" when it should probably say "long-lived."

      We corrected this typo.

      Reviewer #2 (Recommendations For The Authors):

      It's interesting that ligand-specific binding mode of non-covalent ligands was observed. Would modifications of the chemical structure of a covalent inhibitor alter the allosteric binding behavior of non-covalent ligands in a predictive manner? If so, how can such SAR be used to guide the design of covalent inhibitors to more broadly and effectively inhibit agonists of various chemical structures? Discussion on this topic could be valuable.

      This is an interesting point, which we now discuss in the penultimate and last paragraphs of the discussion:

      “Another way to test this structural model could be through the use of covalent PPARγ inverse agonist analogs with graded activity 23, where one might posit that covalent inverse agonist analogs that shift the LBD conformational ensemble towards a fully repressive LBD conformation may better inhibit synthetic ligand cobinding.”

      “It may be possible to use the crystal structures we obtained to guide structure-informed design of covalent inhibitors that would physically block cobinding of a synthetic ligand. This could be the potential mechanism of a newer generation covalent antagonist inhibitor we developed, SR16832, that more completely inhibit alternate site ligand binding of an analog of MRL20, rosiglitazone and the UFA docosahexaenoic acid (DHA)

      21 and thus may be a better choice for the field to use as a covalent ligand inhibitor of PPARγ.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the usp-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs. 

      Excellent suggestions. USP8 has been identified as a protein associated with ESCRT components, which are crucial for endosomal membrane deformation and scission, leading to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). In usp-50 mutants, we observed a significant reduction in the punctate signals of HGRS-1::GFP and STAM-1 (Figure 1G and H; and Figure1-figure supplement 1B), indicating a disruption in ESCRT-0 complex localization (Author response image 1). Additionally, lysosomal structures are markedly reduced in these mutants. In contrast, we found that early endosomes, as marked by FYVE, RAB-5, RABEX5, and EEA1, are significantly enlarged in usp-50 mutants. Electron microscopy (EM) imaging further revealed an increase in large cellular vesicles containing various intraluminal structures. Given the reduction in lysosomal structures and the enlargement of early endosomes in usp-50 mutants, these enlarged vesicles are likely aberrant early endosomes rather than late endosomal or lysosomal structures. To address potential confusion, we have revised the manuscript according to the reviewer's comments and updated the model to accurately reflect these observations.

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion. 

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model. 

      Excellent point. To test whether USP-50 regulates endosome maturation through RABX-5, we performed additional genetic analyses. In rabx-5(null) mutant animals, the morphology of 2xFYVE-labeled early endosomes is comparable to that of wild-type controls (Figure 4H and I). Introducing the rabx-5(null) mutation into usp-50(xd413) backgrounds resulted in a significant suppression of the enlarged early endosome phenotype characteristic of usp-50(xd413) mutants (Figure 4H and I). These findings suggest that USP-50 may modulate the size of early endosomes through its interaction with RABX-5.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation. 

      Weaknesses: 

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript. 

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it. 

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. Electron microscopy (EM) analysis indicated that usp-50 mutation leads to abnormally enlarged vesicles containing various intraluminal structures in worm epidermal cells. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Within Figures 1K-N, diverse anomalous structures were detected in the usp-50 mutant. Further scrutiny is needed to definitively characterize these structures, particularly as the images in Figures 1M and 1L exhibit notable similarities to lamellar bodies.

      We thank the reviewer for the insightful question regarding the resemblance between the vesicles observed in our study and lamellar bodies (LBs). Lamellar bodies are specialized organelles involved in lipid storage and secretion1, prominently studied in keratinocytes of the skin and alveolar type II (ATII) epithelial cells in the lung2. These organelles contain not only lipids but also cell-type specific proteins and lytic enzymes. Due to their acidic pH and functional similarities, LBs are classified as lysosome-related organelles (LROs) or secretory lysosomes3,4. In usp-50 mutants, we observed a considerable number of abnormal vesicles, some of which contain threadlike membrane structures and exhibit morphological similarities to LBs (Figure 2O). However, further analysis with a comprehensive panel of lysosome-related markers demonstrated a significant reduction in lysosomal structures within these mutants. In contrast, vesicles marked by early endosome markers, such as FYVE, RAB-5, RABX-5, and EEA1, were notably enlarged. These results suggest that the enlarged vesicles observed in usp-50 mutants are more likely aberrant early endosomes rather than true lamellar bodies. We have revised the manuscript to reflect these findings and to clearly differentiate between these structures and lysosome-related organelles.

      (2) The correlation between the presence of these abnormal structures and ESCRT-0 remains unaddressed, thus the assertion that UPS-50 regulates endolysosome trafficking in conjunction with ESCRT-0 lacks empirical support.

      We thank the reviewer for the valuable suggestions. We apologize for any confusion and appreciate the opportunity to clarify our findings. The ESCRT machinery is essential for driving endosomal membrane deformation and scission, which leads to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). Recent research has shown that the absence of ESCRT components results in a reduction of ILVs in worm gut cells5. In wild type animals, the ESCRT-0 components HGRS-1 and STAM-1 display a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly reduced (Figure 1G and H; and Figure 1-figure supplement 1B), indicating a role for USP-50 in stabilizing the ESCRT-0 complex. Our TEM analysis revealed an accumulation of abnormally enlarged vesicles containing intraluminal structures in usp-50 mutants. When we examined a panel of early endosome and late endosome/lysosome markers, we found that early endosomes are significantly enlarged, while late endosomal/lysosomal structures are markedly reduced in these mutants. This suggests that the abnormal structures observed in usp-50 mutants are likely enlarged early endosomes rather than classical MVBs. To further investigate whether the reduction in ESCRT components contributes to the late endosome/lysosome defects, we analyzed stam-1 mutants. In these mutants, the size of RAB-7-coated vesicles was reduced (Author response image 1C), and the lysosomal marker LAAT-1 indicated a reduction in lysosomal structures (Author response image 1B). These results highlight the importance of the ESCRT complex in late endosome/lysosome formation. However, the morphology of early endosomes, as marked by 2xFYVE, remained similar to that of wild type in stam-1 mutants (Author response image 1A). Therefore, while reduced ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the enlargement of early endosomes in these mutants may involve additional mechanisms. We have revised the manuscript to incorporate these insights and to address the reviewer's comments more comprehensively.

      Author response image 1.

      (A) Confocal fluorescence images of hypodermis expressing YFP::2xFYVE to detect EEs in L4 stage animals in wild type and stam-1(ok406) mutants. Scale bar: 5 μm. (B) Confocal fluorescence images of hypodermal cell 7 (hyp7) expressing the LAAT-1::GFP marker to highlight lysosome structures in 3-day-old adult animals. Compared to wild type, LAAT-1::GFP signal is reduced in stam-1(ok406) animals. Scale bar, 5 μm. (C) The reduction of punctate endogenous GFP::RAB-7 signals in stam-1(ok406) animals. Scale bar: 10 μm.

      (3) Endosomal dysfunction typically leads to significant alterations in the spatial arrangement of marker proteins across distinct endosomes. In the manuscript, the authors examined the distribution and morphology of early endosomes, multivesicular bodies (MVBs), late endosomes, and lysosomes in a usp-50 deficient background primarily through single-channel confocal imaging. By employing two color images showing RAB-5 and RAB-7, in conjunction with HGRS-1, a more comprehensive picture of the aftermath of USP-50 loss can be obtained.

      Good suggestions. We have conducted a double-labeling analysis to examine the distribution of RAB-5 and RAB-7 in conjunction with HGRS-1. In wild type animals, HGRS-1 exhibits a punctate distribution that is partially co-localized with both RAB-5 and RAB-7. In contrast, in usp-50 mutants, the punctate signal of HGRS-1 is significantly reduced, along with its co-localization with RAB-5 and RAB-7 (Author response image 2). These results suggest that, in the absence of USP-50, the stabilization of ESCRT-0 components on endosomes is compromised.

      Author response image 2.

      ESCRT-0 is adjacent to both early endosomes and late endosomes. (A) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-5. (B) HGRS-1 and RAB-5 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-5) and M2 (RAB-5/HGRS-1) (N=10). (C) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-7. (D) HGRS-1 and RAB-7 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-7) and M2 (RAB-7/HGRS-1) (N=10). Scale bar: 10 μm for (A) and (C).

      (4) The authors observed enlarged early endosomes in cells depleted of usp-50/usp8, along with enlarged MVB-like structures identified through TEM. The potential identity of these structures as the same organelle could be determined using CLEM.

      We thank the reviewer for the valuable suggestion. Our TEM analysis identified a large number of abnormally enlarged vesicles with various intraluminal structures accumulated in usp-50 mutants. As the reviewer correctly noted, CLEM (correlative light and electron microscopy) would be an ideal approach to further characterize these structures. We have been attempting to implement CLEM in C. elegans for a few years. Given that CLEM relies on fluorescence markers, in this study we focused on two tagged proteins, RAB-5 and RABX-5, which show enlargement in their vesicles in usp-50 mutants. Unfortunately, we encountered significant challenges with this approach, as the GFP-tagged RAB-5 and RABX-5 signals did not survive the electron microscopy procedure. Attempts to align EM sections with residual GFP signaling yielded results that were not convincing. Consequently, we concentrated our analysis on a panel of molecular markers, including 2xFYVE, RAB-5, RABX-5, RAB-7, and LAAT-1. These markers consistently indicated that early endosomes are specifically enlarged in usp-50 mutants, while late endosomal/lysosomal structures are notably reduced. Thus, the abnormal structures identified in usp-50 mutants via TEM are likely to be enlarged early endosomes rather than the classical view of MVBs. We have revised the manuscript to reflect these findings and to clarify this point.

      (5) The working model depicted in Figure 6 Y (right) requires revision, as it has the potential to mislead authors into mistaking enlarged early endosomes for multivesicular bodies (MVBs).

      We thank the reviewer for the excellent suggestion. We have revised the model to clarify that it is the enlarged early endosomes, rather than MVBs, that are observed in usp-50 mutants.

      Reviewer #2 (Recommendations For The Authors):

      (1) Is there any change of Rabx5 protein level in USP8/USP50 mutant cells?

      Good question. In the absence of usp-50/usp8, we indeed observed a noticeable increase in the signal of Rabex5 on endosomes. To determine whether usp-50/usp8 affects the protein level of Rabex5, we investigated the endogenous levels of RABX-5 using the RABX-5::GFP knock-in line. Compared to wild-type controls, we found an elevated protein level of RABX-5::GFP in the knock-in line (Author response image 3). This suggests that USP-50 may play a role in the destabilization of RABX-5/Rabex5 in vivo.

      Author response image 3.

      The endogenous RABX-5 protein level is increased in usp-50 mutants. (A) The RABX-5::GFP KI protein level is increased in usp-50(xd413). (B) Quantification of endogenous RABX-5::GFP protein level in wild type and usp-50(xd413) mutant animals.

      (2) It is interesting that "The rabx-5(null) animals are healthy and fertile and do not display obvious morphological or behavioral defects.", which seems contrary to its role in regulating USP8 localization and endosome maturation.

      It has been previously documented that rabx-5 functions redundantly with rme-6, another RAB-5 GEF in C. elegans, to regulate RAB-5 localization in oocytes6. RNA interference (RNAi) targeting rabx-5 in a rme-6 mutant background results in synthetic lethality, whereas neither rabx-5 nor rme-6 single mutants are essential for worm viability. RME-6 co-localizes with clathrin-coated pits, while Rabex-5 is localized to early endosomes. Rabex-5 forms a stable complex with Rabaptin-5 and is part of a large EEA1-positive complex on early endosomes, whereas RME-6 does not interact with Rabaptin-5 (RABN-5) or EEA-1. These findings suggest that while RME-6 and RABX-5 may function redundantly, they likely play distinct roles in regulating intracellular trafficking processes. In the absence of RABX-5, USP-50 appears to lose its endosomal localization, although the size of the early endosome remains comparable to that of wild type. This observation contrasts with the phenotype associated with USP-50 loss-of-function, in which the early endosome is notably enlarged. These results suggest that residual USP-50 present in the endosomes is sufficient to maintain its role in the endocytic pathway. Conversely, the complete absence of USP-50 likely disrupts the transition of early endosomes to late endosomes, indicating a crucial role of USP-50 in this conversion process. It is also noteworthy that, although loss-of-function of rabx-5 does not result in obvious changes to early endosomes, increasing the gene expression level of rabx-5/Rabex-5 alone is sufficient to cause enlargement of early endosomes (Author response image 4) . Indeed, we observed that loss-of-function mutations in u_sp-50/usp_8 lead to abnormally enlarged early endosomes, accompanied by an enhanced signal of endosomal RABX-5. When the rabx-5(null) mutation was introduced into usp-50 mutant animals, the enlarged early endosome phenotype seen in usp-50 mutants was significantly suppressed (Figure 4H and I). This implies that maintaining a lower level of Rab5 GEF may be crucial for endolysosomal trafficking.

      (3) Does Rabx5 mutation has any impact on early endosomes?

      To address the question, we utilized the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we found that the 2xFYVE-labeled early endosomes are indistinguishable from wild type (Figure 4H and 4I). Given that r_abx-5_ functions redundantly with rme-6, another RAB-5 GEF in C. elegans, it is likely that the regulation of early endosome size involves a cooperative interaction between RABX-5 and RME-6.

      (4) The authors observed a reduction of ESCRT-0 components in USP8 mutant cells, could this contribute to the late endosome/lysosome defects?

      Good suggestion. In wild-type animals, the two ESCRT-0 components, HGRS-1 and STAM-1, exhibit a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly diminished (Figure 1G and H; and Figure 1-figure supplement 1B), which aligns with the role of USP-50 in stabilizing the ESCRT-0 complex. To investigate whether the reduction in ESCRT components might contribute to defects in late endosome/lysosome formation, we examined stam-1 mutants. In stam-1 mutants, we observed a reduction in the size of RAB-7-coated vesicles (Author response image 1). Further, when we introduced the lysosomal marker LAAT-1::GFP into stam-1 mutants, we found a substantial decrease in lysosomal structures compared to wild-type animals (Author response image 1). This suggests that the ESCRT complex is essential for proper late endosome/lysosome formation. In contrast, the morphology of early endosomes, as indicated by the 2xFYVE marker, appeared normal in stam-1 mutants, similar to wild-type animals (Author response image 1). This implies that while a reduction in ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the early endosome enlargement phenotype in _usp-5_0 mutants may involve additional mechanisms.

      (5) Rabx5 is accumulated in USP8 mutant cells, I am very curious about the phenotype of USP8-Rabx5 double mutants. Could over-expression of Rabx5 (wild type or mutant forms) cause any defects?

      Excellent suggestions. To address the question, we employed the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we observed that the punctate USP-50::GFP signal became diffusely distributed (Figure 4F and G). This suggests that rabx-5 is necessary for the endosomal localization of USP-50. Interestingly, in rabx-5(null) mutant animals, the 2xFYVE-labeled early endosomes appeared similar to those in wild-type animals (Figure 4H and I). When rabx-5(null) was introduced into usp-50 mutant animals, the enlarged early endosome phenotype observed in usp-50 was significantly suppressed (Figure 4H and I). This finding indicates that usp-50 indeed functions through rabx-5 to regulate early endosome size. Additionally, we constructed strains overexpressing either wild-type or K323R mutant RABX-5. Our results showed that overexpression of wild-type RABX-5 led to early endosome enlargement (as indicated by YFP::2xFYVE labeling) (Author response image 4A, B and D). In contrast, overexpression of the K323R mutant RABX-5 did not result in noticeable early endosome enlargement (Author response image 4A, C and D). Together, these data are in consistent with our model that USP-50 may regulate RABX-5 by deubiquitinating the K323 site.

      Author response image 4.

      (A-C) Over-expression wild type RABX-5 causes enlarged EEs (labeled by YFP::2xFYVE) while RABX-5(K323R) mutant form does not. (D) Quantification of the volume of individual YFP::2xFYVE vesicles. Data are presented as mean ± SEM. ****P<0.0001. ns, not significant. One-way ANOVA with Tukey’s test.

      (6) Rabx5 could be ubiquitinated at K88 and K323, and Rabx5-K323R showed different activity when compared with the wild-type protein in USP8 mutant cells. Could the authors provide evidence that USP8 could remove the ubiquitin modification from K323 in Rabx5 protein?

      We appreciate the reviewer's insightful suggestions. To explore the potential of USP-50 in removing ubiquitin modifications from lysine 323 on the RABX-5 protein, we undertook a series of experiments. Initially, we sought to determine whether USP-50 influences the ubiquitination level of RABX-5 in vivo. However, due to the low expression levels of USP-50, we encountered challenges in obtaining adequate amounts of USP-50 protein from worm lysates. To overcome this, we expressed USP-50::4xFLAG in HEK293 cells for subsequent affinity purification. Concurrently, we utilized anti-GFP agarose beads to purify RABX-5::GFP from worms expressing the rabx-5::gfp construct. We then incubated RABX-5::GFP with USP-50::4xFLAG for varying durations and performed immunoblotting with an anti-ubiquitin antibody. As shown in Author response image 5A, our results revealed a decrease in the ubiquitination level of RABX-5 in the presence of USP-50, suggesting that USP-50 directly deubiquitinates RABX-5. Previous studies have indicated that only a minor fraction of recombinant RABX-5 undergoes ubiquitination in HeLa cells, which is believed to have functional significance7. Our findings are consistent with this observation, as only a small fraction of RABX-5 in worms is ubiquitinated. Rabex-5 is known to interact with both K63- and K48-linked poly-ubiquitin chains. To further elucidate whether USP-50 specifically targets K48 or K63-linked ubiquitination at the K323 site of RABX-5, we incubated various HA-tagged ubiquitin mutants with either wild-type or K323R mutant RABX-5 protein. Our results indicated that the K323R mutation reduces K63-linked ubiquitination of RABX-5 (Author response image 5). This experiment was repeated multiple times with consistent results. Additionally, while overexpression of wild-type RABX-5 led to an enlargement of early endosomes, as evidenced by YFP::2xFYVE labeling, overexpression of the K323R mutant did not produce a noticeable effect on endosome size (Author response image 4). Collectively, this finding indicates that RABX-5 is subject to ubiquitin modification in vivo and that USP-50 plays a significant role in regulating this modification at the K323 site.

      Author response image 5.

      (A) RABX-5::GFP protein was purified from worm lysates using anti-GFP antibody. FLAG-tagged USP-50 was purified from HEK293T cells using anti-FLAG antibody. Purified RABX-5::GFP was incubated with USP-50::4FLAG for indicated times (0, 15, 30, 60 mins), followed by immunoblotting using antibody against ubiquitin, FLAG or GFP. In the presence of USP-50::4xFLAG, the ubiquitination level of RABX-5::GFP is decreased. (B) Quantification of RABX-5::GFP ubiquitination level from three independent experiments. (C) HEK293T cells were transfected with HA-Ub or indicated mutants and 4xFLAG tagged RABX-5 or RABX-5 K323R mutant for 48h. The cells were subjected to pull down using the FLAG beads, followed by immunoblotting using antibody against HA or Flag.

      (7) The authors described "the almost identical phenotype of usp-50/usp8 and sand-1/Mon1 mutants", found protein-protein interaction between USP8 and sand-1, and showed that sand1-GFP signal is diminished in USP8 mutant cells. These observations fit with the possibility that USP8 regulates the stability of sand-1 to promote endosomal maturation. Could this be tested and integrated into the current model?

      are grateful for the insightful comments provided by the reviewer. Rab5, known to be activated by Rabex-5, plays a crucial role in the homotypic fusion of early endosomes. Rab5 effectors also include the Rab7 GEF SAND-1/Mon1–Ccz1 complex. Rab7 activation by SAND-1/Mon1-Ccz1 complex is essential for the biogenesis and positioning of late endosomes (LEs) and lysosomes, and for the fusion of endosomes and autophagosomes with lysosomes. The Mon1-Ccz1 complex is able to interact with Rabex5, causing dissociation of Rabex5 from the membrane, which probably terminates the positive feedback loop of Rab5 activation and then promotes the recruitment and activation of Rab7 on endosomes. In our study, we identified an interaction between USP-50 and the Rab5 GEF, RABX-5. In the absence of USP-50, we observed an increased endosomal localization of RABX-5 and the formation of abnormally enlarged early endosomes. This phenotype is reminiscent of that seen in sand-1 loss-of-function mutants, which also exhibit enlarged early endosomes and a concomitant reduction in late endosomes/lysosomes. Notably, USP-50 also interacts with SAND-1, suggesting a potential role in regulating its localization. We could propose several models to elucidate how USP-50 might influence SAND-1 localization, including:

      (1) USP-50 may stabilize SAND-1 through direct de-ubiquitination.

      (2) In the absence of USP-50, the sustained presence of RABX-5 could lead to continuous Rab5 activation, which might hinder or delay the recruitment of SAND-1.

      (3) USP-50 could facilitate SAND-1 recruitment by promoting the dissociation of RABX-5.

      We are actively investigating these models in our laboratory. Due to space constraints, a more detailed exploration of how USP-50 regulates SAND-1 stability will be presented in a separate publication.

      References:

      (1) Schmitz, G., and Müller, G. (1991). Structure and function of lamellar bodies, lipid-protein complexes involved in storage and secretion of cellular lipids. J Lipid Res 32, 1539-1570.

      (2) Dietl, P., and Frick, M. (2021). Channels and Transporters of the Pulmonary Lamellar Body in Health and Disease. Cells-Basel 11. https://doi.org/10.3390/cells11010045.

      (3) Raposo, G., Marks, M.S., and Cutler, D.F. (2007). Lysosome-related organelles: driving post-Golgi compartments into specialisation. Current opinion in cell biology 19, 394-401. https://doi.org/10.1016/j.ceb.2007.05.001.

      (4) Weaver, T.E., Na, C.L., and Stahlman, M. (2002). Biogenesis of lamellar bodies, lysosome-related organelles involved in storage and secretion of pulmonary surfactant. Semin Cell Dev Biol 13, 263-270. https://doi.org/10.1016/s1084952102000551.

      (5) Ott, D.P., Desai, S., Solinger, J.A., Kaech, A., and Spang, A. (2024). Coordination between ESCRT function and Rab conversion during endosome maturation. bioRxiv, 2024.2005.2014.594104. https://doi.org/10.1101/2024.05.14.594104.

      (6) Sato, M., Sato, K., Fonarev, P., Huang, C.J., Liou, W., and Grant, B.D. (2005). Caenorhabditis elegans RME-6 is a novel regulator of RAB-5 at the clathrin-coated pit. Nature cell biology 7, 559-569. https://doi.org/10.1038/ncb1261.

      (7) Mattera, R., Tsai, Y.C., Weissman, A.M., and Bonifacino, J.S. (2006). The Rab5 guanine nucleotide exchange factor Rabex-5 binds ubiquitin (Ub) and functions as a Ub ligase through an atypical Ub-interacting motif and a zinc finger domain. The Journal of biological chemistry 281, 6874-6883. https://doi.org/10.1074/jbc.M509939200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non functioning lines further added to that.

      Weaknesses:

      My main concerns with the work are twofold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. The sptPALM data conclusions are nice and fit the narrative. However, no raw data or movie is shown, only representative tracks. Therefore, the data quality cannot be verified and in addition, the reporting of number of single particle events visualised per experiment is absent, only number of cells imaged is reported. Therefore, it is impossible for the reader to appreciate the number of single molecule behaviours obtained and hence the quality of the data.

      (2) Secondly, remorins are involved in a lot of nanodomain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this but more evidence would strengthen it in my opinion. How do we not know that during viral infection the entire PM protein dynamics and organisation are altered? Or that CPK3 and REM are at very distant ends of a signalling cascade. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection. In addition, if the interaction is specific, the transiently expressed CPK3-CA construct (shown to from nano domains) should be expressed with REM1.2-mEOS to show the alterations in single particle behaviour occur due to specific activations of CPK3 and REM1.2 in the absence of PIAMV viral infection and it is not an artefact of whole PM changes in dynamics during viral infection.

      In addition, displaying more information throughout the manuscript (such as raw particle tracking movies and numbers of tracks measured) on the already generated data would strengthen the manuscript further.

      Overall, I think this work has the potential to be a very strong manuscript but additional reporting of methods and data are required and additional lines of evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

      Reviewer #2 (Public Review):

      Summary:

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent.

      Strengths:

      The paper contains novel, important information.

      Weaknesses:

      The interpretation of some experimental data is not justified, and the proposed model is not fully based on the available data.

      Reviewer #3 (Public Review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is a lack of evidence for the downstream pathways, specifically whether the role that CPK3 has in cytoskeletal organisation may play a role in the plant's defence against viral propagation. Also, there is limited discussion about the localisation of the nano-domains and whether there is any overlap with plasmodesmata, which as plant viruses utilise PD to move from cell to cell seems an obvious avenue to investigate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Viral spread work in CPK mutants with time courses is beautiful!

      Regarding my public points on my issues with the imaging:

      - Figure 2A shows 'PM' localisation of CPK3 and 'non-PM' imaging of CPK3-G2A. The images are near identical both showing cell outlines and cytoplasmic strands. Here a PM marker (such as Lti6B) tagged with a different fluorophore or PM stain should be used in conjunction with surface views (such as in Figure 2C) to show it really is at the PM and the G2A line is not.

      Impaired membrane localization of CPK3-G2A is documented in Mehlmer et al., 2010 using microsomal fractionation. Although Figure 2A main purpose is to show correct expression of the constructs in the lines used for PlAMV propagation (Figure 2B), we replaced the images with wider view pictures to be more representative of the subcellular localization of CPK3 and CPK3-G2A.

      - Regarding Figure 2C, this is extremely noisy and PM heterogeneity is barely observable over the noise from the system (looking at the edges of surface imaged). You mention low resolution was an issue. I notice from the methods you have taken confocal images on an Zeiss 880 with Airyscan. These images must be confocal but If imaged with Airyscan the PM heterogeneity would be much clearer (see work from John Runions lab).

      Indeed, these are tangential views images obtained by Zeiss 880 with Airyscan. Based on tessellation analysis (Figure 2H-J), CPK3 is rather homogeneously distributed and forms ND of around 70nm of diameter. Objects of such size cannot be resolved using pixel reassignment methods such as Airyscan. Note also that AtREM in our study are less heterogeneously distributed than what was described in the literature for StREM1.3.

      - Regarding all sptPALM data. At least an example real data image and video is required otherwise the data can’t be assessed. The work of Alex Martiniere (sptPALM) or Alex Jonson (TIRF) all show raw data so the reader can appreciate the quality of the data. In addition, number of events (particles tracked) has to be shown in the figure legend, not just number of cells otherwise was one track performed per cell? Or 10,000? Obviously where the N sits in this range gives the reader more or less confidence of the data.

      We agree and we added example videos of sptPALM experiments in the supplementary data, we also indicated the number of tracked particles in the figure legends.

      - On a slight technical aside, how do you know the cells being imaged for sptPALM with PIAMV are actually infected with the virus? In Fig 2C you use a GFP tagged version but in sptPALM you use none tagged. I think a sentence in methods on this would help clarify.

      PlAMV-GFP was used for spt-PALM experiment and cell infection was assessed during PALM experiment. This is now precised in the corresponding figures and methods.

      - I also have a concern over some of the representative images showing the same things between different figures. Your clustering data in 3F looks very convincing. However, in Figure 2H the mock and PIAMV-GFP look very similar. How is Figure 3F so different for the same experiment? Especially considering the scale bars are the same for both figures. Same for CPK3-mRFP1.2 in Fig 2C and 3A, the same thing is being imaged, at the same scale (scale bars same size) but the images are extremely different.

      Figure 2 data were generated using CPK3 stably expressed in A. thaliana while Figure 3 data were obtained upon transient over-expression of CPK3 in N. benthamiana. We do not have a clear explanation for such a difference in CPK3 PM behavior, it could lie on a different PM composition or actin organization between those two species, this point is now addressed in the discussion.

      - Line 193&194 - you state that the CA CPK3 is reminiscent of the CPK3 upon PIAMV expression. I don't agree, while CPK3CA is less mobile (2D), the MSD shows it is in-between CPK3 and CPK3 + PIAMV. Therefore, can’t the opposite also be true? That overall the behaviour of CPK3-CA is reminiscent of WT CPK. I think this needs rewording.

      We agree and we reworded that part

      - Line 651 - what numerical aperture are you using for the lens during confocal microscopy. This is fundamentally important information directly related to the reproducibility of the work. You report it for the sptPALM.

      The numerical aperture is now indicated in the methods.

      Regarding my bigger point about specific interactions between CPK3 and remorin during viral infection to strengthen your claim the following need doing. I am not suggesting you do all of these but at least two would significantly enhance the paper.

      (1) Image a none related PM protein during viral infection using sptPALM and demonstrate that its behaviour is not altered (such as lti6b). This would show the affects on remorin behaviour are specific to CPK3 and not a whole scale PM alteration in dynamics due to viral infection.

      (2) Two colour SPT imaging of CPK3 and REM1.2. You show in absence of proteins (knockouts effect on each other) but your only interaction data is from a kinase assay (where CPK1 and 2 also interact, even though they are not localised at the same place) and colocalisation data (see below). A two colour SPT imaging experiment showing interaction and clustering of CPK3 and REM1.2 with each other and the change in their behaviours when viral infected and simultaneously imaged would address all of my concerns.

      - On another note, the co-localisation data (fig 5 sup 4) needs additional analysis. I would expect most PM proteins to show the results you show as the data is very noisy. In order to improve I would zoom in to fill the field of view and then determine correlation and also when one image is rotated 90 degrees (as described in Jarsch et al., plant cell) to enhance this work.

      (3) In the absence of viral infection, but presence of CPK3-CA, is sptPALM REM1.2 behaviour in the PM altered, if so then the interaction is specific and changes in remorin dynamics are not due to whole scale PM changes during viral infection and the manuscript substantially strengthened.

      (4) Building on from 3), if you have a CPK3 mutated with both CPK3-CA and G2A this would be constitutively active and non-PM localised and as such should not affect Remorin behaviour if your model is true, this would strengthen the case significantly but I appreciate is highly artificial and would need to be done transiently.

      Regarding the first point, since the role of PM proteins involved in potexvirus infection is barely assessed, picking a non-related PM protein might be tricky. The data obtained with mEOS3.2-REM1.2 expressed in cpk3 null-mutant point towards a specific role of CPK3 in PlAMV-induced REM1.2 diffusion and not a general alteration of PM protein behavior.

      Regarding the second point, we already reported the in vivo interaction between AtCPK3CA and AtREM1.2/AtREM1.3 by BiFC in N.benthamiana (Perraki et al 2018) and AtCPK3 was shown to co-IP with AtREM1.2 (Abel et al, 2021). While we agree on the relevance of doing dual color sptPALM with CPK3 and REM1.2, it is so far technically challenging and we would not be able to implement this in a timely manner. For the colocalization, although the whole cell is displayed in the figure, the analysis was performed on ROI to fill the field of analysis.

      We agree with the relevance of adding the colocalization analysis of randomized images (mTagBFP2 channel rotated 90 degrees), this is now added to Figure 5 – supplement figure 5.

      Finally, for the third and fourth points, spt-PALM analysis of REM1.2 in presence of CPK3-CA and CPK3-CA-G2A was performed (Figure 5 - figure supplement 4). The results suggest a specific role of CPK3-CA in REM1.2 diffusion.

      Minor points:

      Line 59 - from, I think you mean from.

      Line 63 - Reference needed after latter.

      Line 68 - Reference required after viral infection.

      Line 85 - Propose not proposed.

      Line 156 - Allowed us to not allows to.

      Line 204 - add we previously 'demonstrated'

      Line 622 and 623 - You say lines obtained from Thomas Ott. This is very odd phrasing considering he is an author. I appreciate citing the work producing the lines but maybe reword this

      These points were corrected, thank you.

      Reviewer #2 (Recommendations For The Authors):

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent. The paper contains novel, important information that can undoubtedly be published in eLife. However, I have some concerns that should be addressed before it can be accepted for publication.

      Major concerns

      When the authors say that CPK3 plays a role in viral propagation, it should be clarified what is meant by 'propagation', - replication of the viral genome, its cell-to-cell transport, or long-distance transport via the phloem. By default the readers will tend to assume the former meaning. In my opinion, the term 'propagation' is misleading and should be avoided.

      We purposely chose the term “propagation” because it sums replication and cell-to-cell movement. Nevertheless, we previously showed that group 1 StREM1.3 doesn’t alter PVX replication (Raffaele et al., 2009 The Plant Cell). In this paper, as we do not investigate the role of AtREM1.2 or AtCPK3 in the replication of the viral PlAMV genome, we cannot state that these proteins are strictly involved in cell-to-cell movement of the virus.

      The authors show that viral infection is associated with decreased diffusion of CPK3 and increased diffusion of REM1.2 in the PM. However, it remains unclear whether these changes are related to partial resistance to viral infection involving CPK3 and REM1.2, or whether they are simply a consequence of viral infection that may lead to altered PM properties and altered dynamics of PM-associated proteins. Therefore, the model presented in Fig. 6 appears to be entirely speculative, as it postulates that changes in CPK3 and REM1.2 dynamics are the cause of suppressed virus 'propagation'. In addition, the model implies that a decrease in CPK3 mobility leads to activation of its kinase activity. This view is not supported by experimental data (see my next comment). The model should be deleted (both as the figure and its description in the Discussion) or substantially reworked so that it finally relies on existing data.

      For the first point, the results obtained from the additional experiments proposed by reviewer #1 supports the hypothesis of a direct impact of CPK3 on REM1.2 diffusion (Figure 5 - figure supplement 4).

      We agree with the second point and reworked the model to remove the link between CPK3 activation and its increased diffusion.

      The statement that 'changes in CPK3 dynamics upon PlAMV infection are linked to its activation' (line 194) is based on a flawed logic, and the conclusion in this section of Results ('changes in CPK3 dynamics upon PlAMV infection are linked to its activation') is incorrect, as it is not supported by experimental data. In fact, the authors show that CPK3 dynamics and clustering upon viral infection is somewhat reminiscent of the behavior of a CPK3 deletion mutant, which is a constitutively active protein kinase. However, this partial similarity cannot be taken as evidence that CPK3 dynamics upon PlAMV infection are related to its activation. Furthermore, the authors emphasize the similarity of the mutant and CPK3 in infected cells without taking into account a drastic difference in their localization (Fig. 3A, middle and right panels) showing that the reduced dynamics or the compared proteins may have different causes. I suggest the removal of the section 'CPK3 activation leads to its confinement in PM ND' from the paper, as the results included in this section are not directly related to other data presented.

      The PM lateral organization of PM-bound CPKs in their native or constitutively active form as well as the role of lipid in such phenomenon was never shown before. We believe that this section contains relevant information for the community. We kept the section but reworded it to tamper the correlation made between CPK3 PM organization upon viral infection and its activation.

      Line 270 - 'group 1 REMs might play a role in CPK3 domain stabilization upon viral infection'. This is an overstatement. The size of the CPK3-containing NDs may have no correlation with their stability.

      We reworded the sentence.

      Minor points

      Line 204 - we previously that Line 234 and hereafter - "the D" sounds strange. Suggest using "the diffusion coefficient".

      This was reworded.

      Reviewer #3 (Recommendations For The Authors):

      The authors have previously demonstrated that there was an increase in REM1.2 localisation to plasmodesmata under viral challenge. It would be useful to see if there was any co-localisation of REM1.2 and CPK3 with plasmodesmata in response to PlAMV and how this is affected in the mutants. This could be carried out relatively simply using aniline blue.

      These experiments were added to the supplementary data of Figure 2 – figure supplement 2.  and Figure 4 – figure supplement 4. , no enrichment of CPK3 or REM1.2 at plasmodesmata could be observed upon PlAMV infection.

      Fig 3 supplementary figure 2 would be better incorporated into the main body of Figure 3 as this underpins discussion on the involvement of lipids such as sterols in the formation of nanodomains.

      We moved Figure 3 – Supplementary figure 2 to the main body of Figure 3.

      Minor corrections:

      Whilst the paper is generally well written there are a number of grammatical errors:

      Line 1 & 2: Title doesn't quite read correctly, suggest a rewording for clarity.

      L31: Insert "a"after only

      L33: Replace "are playing" with "play"

      L34: Begin sentence "Viruses are intracellular pathogens and as such the role..."59: replace "form" with "from"

      L63: Insert "was demonstrated" after REM1.2)

      L85: Replace "proposed" with "propose"

      L86: replace "encouraging to explore" with "which will encourage further exploration of "

      L129: replace "we'll focus on" with "we concentrated on"

      L131: insert "an" before ATP

      L138: change "among" to "amongst"

      L156: change "allows to analyse" to "allows the analysis of"

      L204: Insert "showed" after previously.

      L232: "root seedlings" should this be the roots of seedlings?

      L235: insert "to" after "as"

      L280: insert "a" after "only"

      L281: change " to play" with "as playing": change CA to superscript

      L307: Insert "was" after "transcription"

      L320: change "display" to "displaying"

      L321: change "form" to forms"

      L340: "hampering" should come before viral

      L365: insert"us' after "allow"

      Thank you, these were corrected

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions. 

      We thank the reviewer for their thoughtful comments. To clarify, the grid-world setup was used as a didactic tool/testbed to understand the interaction between Pavlovian and instrumental systems (lines 80-81) [Dayan et al., 2006], specifically in the context of safe exploration and learning. It helps us delineate the Pavlovian contributions during learning, which is key to understanding the safety-efficiency dilemma we highlight. This approach generates a hypothesis about outcome uncertainty-based arbitration between these systems, which we then test in the approach-withdrawal VR experiment based on foundational studies studying Pavlovian biases [Guitart-Masip et al., 2012, Cavanagh et al., 2013].

      Although the VR task does not explicitly involve rewards, it provides a specific test of our hypothesis regarding flexible Pavlovian fear bias, similar to how others have tested flexible Pavlovian reward bias without involving punishments (e.g., Dorfman & Gershman, 2019). Both the simulation and VR experiment models are derived from the same theoretical framework and maintain algebraic mapping, differing only in task-specific adaptations (e.g., differing in action sets and temporal difference learning for multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task). This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. Therefore, we respectfully disagree that the two setups are completely unrelated and that both models include components merely labelled as Pavlovian.

      We will rephrase parts of the manuscript to prevent the main message of our manuscript from being misconveyed. Particularly in the Methods and Discussion, to clarify that our main focus is on Pavlovian fear bias in safe exploration and learning (as also summarised by reviewers #2 and #3), rather than on its role in complex navigational decisions. We also acknowledge the need for future work to capture more sophisticated safe behaviours, such as escapes and sophisticated planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020], and we will highlight these as avenues for future research.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thank you for this comment. We acknowledge that our paper does not compare the Pavlovian fear system to a purely instrumental system with varying punishment sensitivity. Instead, our model assumes the coexistence of these two systems and demonstrates the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone. In light of the reviewer’s comment, we will soften our claims regarding the necessity of the Pavlovian system, despite its known existence.

      We also encourage the reviewer to consider the Pavlovian system as a biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies, the neural substrates for the Pavlovian fear system (e.g., the limbic loop) are well known (see Supplementary Fig. 16).

      Additionally, we point out that varying reward sensitivities while keeping punishment sensitivity constant allows our PAL agent to differentiate from an instrumental agent that combines reward and punishment into a single feedback signal. As highlighted in lines 136-140 and the T-maze experiment (Fig. 3 A, B, C), the Pavlovian system maintains fear responses even under high reward conditions, guiding withdrawal behaviour when necessary (e.g., ω = 0.9 or 1), which is not possible with a purely instrumental model if the punishment sensitivities are fixed. This is a fundamental point.

      We will revise our discussion and results sections to reflect these clarifications.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thank you, we respectfully disagree with the statement that our models used in the experimental setup are dissimilar to the ones used in the first setup. Due to differences in the nature of the task setup, the action set differs, but the model equations and the theory are the same and align closely, as described in our response above. The only additional difference is the use of a baseline bias in human experiments and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in grid world simulations. We will improve our Methods section to ensure that model similarity is highlighted.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      We thank reviewer #1 for acknowledging the relevance of our models in advancing the field. We would like to further highlight that, to the best of our knowledge, this is the first time reaction times in Pavlovian-Instrumental arbitration tasks have been modelled using RLDDM, which adds a novel dimension to our approach.

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      We acknowledge the dissimilarity between the task setups (grid-world vs. approach-withdrawal). However, we believe these setups are computationally similar and may be biologically related, as suggested by prior work like Dayan et al. [2006], which integrates Go-No Go and grid-world tasks. Just as that work bridged findings in the appetitive domain, we aim to integrate our findings in the aversive domain. We will provide a more integrated interpretation in the discussion section of the revised manuscript.

      Dayan, P., Niv, Y., Seymour, B., and Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural networks, 19(8):1153–1160.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Thank you for your feedback. As mentioned above, we invite the reviewer to potentially think of Pavlovian fear systems as a way how the brain might implement punishment sensitivity. Secondly, it provides a separate punishment memory that cannot be overwritten with higher rewards (see also Elfwing and Seymour 2017, and Wang et al, 2021)

      Elfwing, S., & Seymour, B. (2017, September). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 140-147). IEEE. 

      Wang, J., Elfwing, S., & Uchibe, E. (2021). Modular deep reinforcement learning from reward and punishment for robot navigation. Neural Networks, 135, 115-126.

      The simulation setups such as the following grid-worlds are common test-beds for algorithms in reinforcement learning [Sutton and Barto, 2018].

      Any experimental setup faces the problem of having a constrained experiment designed to test and model a specific effect versus designing a lesser constrained exploratory experiment which is more difficult to model. Here we chose the former, building upon previous foundational experiments on Pavlovian bias in humans [Guitart-Masip et al., 2012, Cavanagh et al., 2013].  The condition where withdrawal from a jellyfish leads to a sting, though less realistic, was included for balancing the four cue-outcome conditions. Overall the task was designed to isolate the effect we wanted to test - Pavlovian fear bias in choices and reaction times, to the best of our ability. In a free operant task, it is very well likely that other components not included in our model could compete for control.

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      We agree that safe behaviours, such as escapes, involve more sophisticated computations. We do not propose Pavlovian fear bias as the sole computation for safe behavior, but rather as one of many possible contributors. Knowing about the existence about the Pavlovian withdrawal bias, we simply study its possible contribution. We will include in our discussion that such behaviours likely occupy different parts of the threat-imminence continuum [Mobbs et al., 2020].

      Dean Mobbs, Drew B Headley, Weilun Ding, and Peter Dayan. Space, time, and fear: survival computations along defensive circuits. Trends in cognitive sciences, 24(3):228–241, 2020.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      We thank the reviewer for their comment. We selected the action space to build on existing models [Guitart-Masip et al., 2012, Cavanagh et al., 2013] that capture Pavlovian biases and we also wanted to minimize participant movement for EEG data collection. Unfortunately, despite restricting movement to just the arm, the EEG data was still too noisy to lead to any substantial results. We will explore more free-operant paradigms in future works.

      On the issue of the difference between VR and lab-based tasks, we note the reviewer's point. Note however that desktop monitor-based tasks lack the sensorimotor congruency between the action and the outcome. Second, it is also arguable, that the background context is important in fear conditioning, as it may help set the tone of the fear system to make aversive components easier to distinguish.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      We thank the reviewers for their thoughtful inputs. We do not claim our model is the best fit for all naturalistic VR tasks, as they require multiple systems across the threat-imminence continuum [Mobbs et al., 2020] and are currently beyond the scope of the current work. However, we believe our findings on outcome-uncertainty-based arbitration of Pavlovian bias could inform future studies and may be relevant for testing differences in patients with mental disorders, as noted by reviewer #2. At a general level, it can be said that most well-controlled laboratory-based tasks need to bridge a sizeable gap to applicabilty in real-life naturalistic behaviour; although the principle of using carefully designed tasks to isolate individual factors is well established

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We thank the reviewer for their comments and ideas. In our discussion lines 257-264, we discuss other works which identify similar safety-efficiency dilemmas, in different models. Here, we simply focus on the safety-efficiency trade-off arising from the interactions between Pavlovian and instrumental systems. It is important to note that the computational argument for the modular system with separate rewards and punishments explicitly protects (up to a point, of course) against large rewards leading to death because the Pavlovian fear response is not over-written by successful avoidance in recent experience. Note also that in animals, reward utility curves are typically convex. We will clarify this in the discussion section.

      We completely agree that in certain scenarios, pruning decision trees could be more effective, especially with a model-based instrumental agent. Here we utilise a model-free instrumental agent, which leads to a simpler model - which is appreciated by some readers such as reviewer #2. Future work can incorporate model-based methods.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We thank the reviewer for bringing this to our notice. We will discuss Tzovara et al., 2018 in our discussion in our revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      We thank reviewer #2 for their positive feedback and thoughtful recommendations. We will ensure that, in our revision, we clarify the explanations in the few instances where they may not be sufficiently detailed, as noted.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      We thank reviewer #3 for their thoughtful feedback and useful recommendations, which we will take into account while revising the manuscript.

      We acknowledge the complexity of specifying Pavlovian bias in the grid world and appreciate the opportunity to elaborate on how this bias is modelled. In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et. al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesized to function as a Pavlovian fear/threat learning system [Menegas et. al., 2018].

      Additionally, we explored the possibility of learning the action bias on the fly by tracking additional punishment Q-values instead of pre-training, which produced similar cumulative pain and step plots. While this approach is redundant, and likely not how the brain operates, it demonstrates an alternative algorithm.

      We thank the reviewer for pointing out these potentially unrealistic elements, and we will revise the manuscript to clarify and incorporate these explanations and improve the model descriptions.

      Eun Joo Kim, Omer Horovitz, Blake A Pellman, Lancy Mimi Tan, Qiuling Li, Gal Richter-Levin, and Jeansok J Kim. Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats. Proceedings of the National Academy of Sciences, 110(36):14795–14800, 2013

      William Menegas, Korleki Akiti, Ryunosuke Amo, Naoshige Uchida, and Mitsuko Watabe-Uchida. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature neuroscience, 21(10): 1421–1430, 2018

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development. 

      Weaknesses: 

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed. 

      We appreciate the reviewer's recognition of the impact of our study.  We will address the concerns about data analysis and statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review): 

      Summary: 

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42. 

      Strengths: 

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance. 

      Weaknesses: 

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly.

      We appreciate the reviewer's recognition of the impact of our study.  Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A.  We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review): 

      Summary: 

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism. 

      Strengths: 

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel. 

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function. 

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes. 

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia. 

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings. 

      Weaknesses: 

      (1) A better characterization of the nature of the small EV population is missing: 

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations. 

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a Coomassie gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent 4 bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor: 

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy. 

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate.  Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy to get an accurate calculation of this number.  Nonetheless, we will review our live imaging data for this experiment to determine if this calculation is possible. Again, we will be limited by the frame rate we used to capture the images, so we could possibly be missing secretion events taking place between the 10 second time intervals.  Regardless, for the secretion events that we visualized, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript.  A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging.  We will clarify this in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful. 

      Our data shows that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A.  Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013).  We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 mm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area, as there were no significant differences in cell area between conditions and experiments. We plan to include a new supplementary figure showing the data in Figure 2 plotted as filopodia per cell to show that this quantification gives the same results.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats. 

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions.  We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were usually unable to detect THSD7A using these same conditions for the mouse melanoma B16F1 samples, but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns. Based on our THSD7A trafficking data, we believe that in control cells, most of the THSD7A is getting trafficked and secreted via small EVs. As you can see in Figure 7A, the band for THSD7A in the shScr cell lysate is relatively light and also shows a double band similar to Figure 6E (both HT1080 samples).

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands.  If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant.  Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A: 

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8. 

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet.  In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells? 

      The images for Figure 7E were taken with high resolution on a confocal microscope.  Insets for Figure 7E were zoomed in so that readers could see the tiny structures.  Zoom 1 in Figure 7E shows areas of extracellular deposition. In these areas, we can see small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more export of THSD7A into small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.  Quantification of internal THSD7A localization is much more straightforward in this experimental regime.  Indeed, in Figure 7F we assessed internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

    1. Author response:

      We thank the reviewers for their thoughtful feedback and valuable comments. We plan to fully address their concerns by including the following experiments and analyses:

      Reviewer 1 suggested exploring data scaling trends for encoding models, as successful scaling would justify larger datasets for language ECoG studies. To estimate scaling effects, we will develop encoding models on subsets of our data.

      Reviewer 2 expressed uncertainty about the baseline for model-brain correlation and recommended adding control LLMs with randomly initialized weights. In response, we will generate embeddings using untrained LLMs to establish a more robust baseline for encoding results.

      Reviewer 2 also proposed incorporating control regressors such as word frequency and phonetic features of speech. We will re-run our modeling analysis using control regressors for word frequency, 8 syntactic features (e.g., part of speech, dependency, prefix/suffix), and 3 phonetic features (e.g., phonemes, place/manner of articulation) to assess how much these features contribute to encoding performance.

      Reviewer 3 raised concerns that the “plateau in maximal encoding performance” was actually a decline for the largest models. We will add significance tests in Figure 2B to clarify this issue.

      Reviewer 3 also noted that in Supplementary Figure 1A, the decline in encoding performance was more pronounced when using PCA to reduce embedding dimensionality, in contrast to the trend observed when using ridge regression. To address this, we will attempt to replicate the observed scaling trends in Figure 2B using PCA combined with OLS.

      Additionally, we will provide a point-by-point response and revise the manuscript with updated analyses and figures in the near future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion. 

      Strengths: 

      A strength of the work is the multifaceted approach to examining and testing their hypotheses, using a well-established and well-defined family of immunomodulatory molecules using multiple approaches including an in vivo setting. 

      Weaknesses: 

      There are a few weaknesses of the approach. Perhaps some discussion and speculation as to how these three family members might operate in concert during Heligmosomoides polygyrus bakeri infection would help place the biology of these molecules in context for the reader, e.g. when and where they are produced. 

      We agree that the roles of these proteins during infection requires further study and is not fully elucidated in infection here. We have added further discussion to the manuscript on their potential roles during infection (track changes manuscript, lines 277 – 283).

      Reviewer #2 (Public Review): 

      Summary: 

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo. 

      Strengths: 

      The authors use a comprehensive multidisciplinary approach to assess the binding and their effects in vivo, coupled with molecular modeling. 

      Weaknesses: 

      (1) Figure 1C should include Western. 

      We apologise for this oversight, and now include an uncropped western blot image as a Figure 1, Figure Supplement 1.

      (2) Figure 1E: Why does HpARI1 stop binding DNA at 50%? 

      It is currently unclear why HpARI1 does not bind to all DNA in the EMSA assay, however this was our repeated finding. With our revised findings we can now state definitively that HpARI1 has a lower affinity for HS compared to HpARI2, and in each of our assays (EMSA (Fig 1D-E), size exclusion chromatography (Fig 4A), HS-bead pull-down (Fig 4B), lung cell surface binding (Fig 4C) and ITC (Fig 4D)) HpARI1 always shows a weaker response compared to HpARI2. We hypothesise that HpARI1 binds more weakly to DNA/HS to allow it to diffuse further from the site of deposition, but we have yet to demonstrate this during infection. We add further discussion of this point (track changes manuscript, lines 262 – 266).

      (3) ITC binding experiment with HpARI1? Also, the ITC results from HpARI2 do not seem to saturate, thus it is difficult to really determine the affinity. 

      We have now included HpARI1-HS ITC, and re-ran the HpARI2 experiment to saturation (Fig 4D-E).

      (4) It would be helpful to add docking results from HpARI1. 

      We have now included HpARI1-HS docking, in Figure 5B.

      (5) Some conclusions are speculative and need to remain in the Discussion. e.g.: a) That HpARI3 may be able to diffuse farther 

      We have rewritten these points to remove the speculation on localisation from the abstract (lines 18-19) and introduction (line 78).

      b) That DNA/HS may trap HpARI1/2 at the infection site. 

      Likewise, these points have been rewritten in the abstract and introduction as above, and we have made it clearer that this is a model that we are proposing in the discussion (line 277-283).

      Reviewer #1 (Recommendations For The Authors): 

      The paper is well-written and the data well-presented. I have one small comment that the authors may like to consider. In the discussion, second paragraph, line 17, perhaps, "evolved" rather than "developed". 

      Thank you for this suggestion, we have made this change (line 248).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      To hopefully contribute to more strongly support the conclusions drawn by the authors, I am including a series of concerns regarding the manuscript, as well as some suggestions that could be useful to address these issues:

      (1) The main results of this study derive from the use of auxin-inducible degron (AID)-tagged proteins. Despite the great advantages of the AID strategy to conditionally deplete proteins, the AID tag can affect the normal function of a protein. In fact, some of the AID-labeled DDC components generated in this work are shown to be hypomorphic. Hence, the manuscript would have benefited from the additional confirmation of some of the observations using a different way to eliminate the proteins (e.g., temperature-sensitive mutants).

      Most ts mutants are also hypomorphic; hence we don’t see there is much advantage to their use. The addition of the AID to these proteins alone does not interfere with the ability to sustain checkpoint arrest as demonstrated in Figure S1. Instead we found that by overexpressing Rad9-AID we could demonstrate that inactivating Rad9 after 15 h behaved the same way as the inactivation of Ddc2, significantly strengthening our finding that the DDC checkpoint becomes dispensable while the SAC takes over. 

      (2) In cells depleted of Rad53-AID, the deletion of CHK1 stimulates an earlier release from a mitotic arrest induced by two DSBs (Figures 2D and 3C). Likewise, the authors claim that a faster escape from the cell cycle block can also be observed when upstream factors such as Ddc2, Rad9, or Rad24 are depleted in the absence of CHK1 (Figures 2A-C and Figures 3D-F). However, this earlier release from the cell cycle arrest, if at all, is only slightly noticeable in a Rad9-AID background (Figures 2B and 3E). In this sense, it is also worth pointing out that Rad9-AID chk1Δ (Figure 3E) and Rad24-AID chk1Δ (Figure 3F) cells were only evaluated up to 7 h, while in all other instances, cells were followed for 9 h, which hinders a fair assessment of the differences in the release from the cell cycle arrest.

      As noted above, we have now been able to examine Rad9 over the long-time frame.

      (3) Although only 25% of the cells depleted for Dun1 remained in G2/M arrest 7 h following the induction of two DSBs, it is shocking that Rad53 was nonetheless still phosphorylated after the cells had escaped the cell cycle blockage (Figure 4A).

      This persistence of Rad53 phosphorylation is also seen with the inactivation of Mad2, allowing escape in spite of continued Rad53 phosphorylation.

      (4) Generation of Rad9-AID2 and Rad24-AID2 strains did not fully restore the function of these proteins, since most cells had adapted 24 h after induction of two DSBs (Figure S1C). Nonetheless, Rad9-AID2 and Rad24-AID2 are still likely more stable than their AID counterparts, and hence the authors could have instead used the AID2 proteins for the experiments in Figure 2 to better evaluate the role of Rad9 and Rad24 in the maintenance of the DDC-dependent arrest.

      We note again that we have found a way to study Rad9 up to 24 h. 

      (5) Deletion of BFA1 has been shown to promote the escape from a cell cycle arrest triggered by telomere uncapping (Wang et al. 2000, Hu et al. 2001, Valerio-Santiago et al. 2013). Likewise, while cells carrying the cdc5-T238A allele cannot adapt to a checkpoint arrest induced by one irreparable DSB, BFA1 deletion rescues the adaptation defect of this mutant CDC5 allele (Rawal et al., 2016). The authors show how, using AID-degrons of Bfa1 and Bub2, that only Bub2, but not Bfa1, is required to maintain a prolonged cell cycle arrest after the induction of two DSBs. To reinforce this point, and as shown for mad2Δ cells (Figure S6A), the authors could perform a complete time course using both the Bfa1-AID and a bfa1Δ mutant to demonstrate that they do indeed show the same behavior in terms of the adaptation to a two DSB-induced cell cycle arrest.

      We thank the reviewer for noting these other instances where bfa1D promoted an escape from arrest. We tested a 2-DSB bfa1 deletion, data has been added to Figure S9E-F. We did not observe a difference in the percentage of cells escaping arrest between the 2-DSB bfa1 deletion and the 2-DSB BFA1-AID strains.

      (6) Bypass or adaptation of a checkpoint-induced cell cycle arrest in S. cerevisiae often leads to cells entering a new cell cycle without doing cytokinesis and, hence, to the accumulation of rebudded cells. However, the experiments shown in the manuscript only account for G1 or budded cells with either one or two nuclei. Do any of the mutants show cytokinesis problems and subsequent rebudding of the cells? If so, this should have been also noted and quantified in the corresponding assays.

      In the cases we have studied we have not seen instances where the cells re-bud without completing mitosis (at least as assessed by the formation of budded cells with two distinct DAPI staining masses). In the morphological assays we have done, we score the continuation of the cell cycle by the appearance of multiple buds, G1, and small budded cells. In our adaptation assays when cells escaped G2/M arrest they formed microcolonies indicating no short-term deficiency in cell division.

      (7) The location of the DSB relative to the centromere of a chromosome seems to be a factor that determines the capacity of the SAC to sustain a prolonged cell cycle arrest. The authors discuss the possibility that the DSB could somehow affect the structure of the kinetochore. Did they evaluate whether Mad1 or Mad2 were more actively recruited to kinetochores in those strains that more strongly trigger the SAC after induction of the DSBs?

      We have not attempted to follow Mad1/2 recruitment. ChIP-seq could be used to monitor Mad1/2 localization at the 16 centromeres in response to DSBs and the spread of g-H2AX across the centromere. Our previous data showed that g-H2AX could spread across the centromere region and could create a change that would be detected by Mad1/2.  This change does not, however, affect the mitotic behavior of a strain in which the H2A genes have been modified to the possibly phosphomimetic H2A-S129E allele.

      (8) The authors could speculate in the discussion about the reasons that could explain why the DDC is required for the maintenance of checkpoint arrest at early stages but then becomes dispensable for the preservation of a prolonged cell DNA DSB-induced cycle arrest, which is instead sustained at later stages by the SAC.

      Our suggestion is that cells would have adapted, but modification of the centromere region engages SAC.

      Finally, some minor issues are:

      (1) The lines in the graphs that display the results from adaptation assays (e.g., Figures 1B and 1E) or cell and nuclear morphology (e.g., Figures 1D and 1G) are too thick. This makes it sometimes difficult to distinguish the actual percentages of cells in each category, particularly in the experiments monitoring nuclear division.

      Fixed

      (2) While both the adaptation assay and the analysis of nuclear division in Figures 1E and 1G, respectively, show a complete DDC-dependent arrest at 4h, the Western blot in Figure 1F suggests that Rad53 is not phosphorylated at that time point. Do these figures represent independent experiments? Ideally, the analysis of cell budding and nuclear division, which is performed in liquid cultures, and the Western blot displaying Rad53 phosphorylation should correspond to the same experiment.

      Cell budding in liquid cultures and adaptation assays were performed in triplicate with 3 biological replicates and the collective results are shown in each graph showing the percentage of large-budded cells. Western blot samples were collected in each liquid culture experiment. The western blot in 1G is a representative western blot.

      (3) It is somewhat confusing that the blots for the proteins are not displayed in the same order in Figures 2A (Rad53 at the top) and 2B or 2C (Rad53 in the middle).

      Fixed.  We place Rad53 – the relevant protein - at the top.

      Reviewer #2 (Recommendations For The Authors):

      (1) Yeast with the two breaks responds to DNA damage checkpoint (DDC) until sometimes 4-15 h post DNA damage. Since the auxin-induced degradation does not completely deplete all the tagged proteins in cells, the results should be more carefully considered and not to interpret if the checkpoint entry or maintenance depends on each target protein's ability to induce Rad53 phosphorylation. It should be theoretically possible if checkpoint maintenance requires only a modest amount of checkpoint factors especially because the experiments involve the induction of one or two DSBs. The low levels of DDC factors may be insufficient for Rad53 activation but could still be effective for cell cycle arrest. Indeed, the Haber group showed that the mating type switch did not induce Rad53 phosphorylation but still invoked detectable DNA damage response. To test such possibilities, the authors might consider employing yet another marker for DDC such as H2A or Chk1 phosphorylation besides Rad53 autophosphorylation. Alternatively, the authors might check if auxin-induced depletion also disrupts break-induced foci formation for checkpoint maintenance or their enrichment at DNA breaks using ChIP assays at various points post-damage.

      DAPI staining of Ddc2-AID cells show that when IAA is added 4 h after DSB induction (Figure S3A), cells escape G2/M arrest as evidenced by the increase in large-budded cells with 2 DAPI signals, small budded cells, and G1 cells. Overexpression of Ddc2 can sustain the checkpoint past 24 h, but without SAC proteins like Mad2 they will eventually adapt (Figure S6B).

      That Rad9-AID or Rad24-AID in the absence of added auxin (but in the presence of TIR1) is unable to sustain arrest suggests to us that low levels of Rad9 or Rad24 are not sufficient to maintain arrest.  As the reviewer notes, normal MAT switching doesn’t cause Rad53 phosphorylation or arrest, though early damage-induced events such as H2A phosphorylation do occur.  But our point is that Rad9 or Ddc2 is needed to maintain arrest only up to a certain point, after which they become superfluous and a different checkpoint arrest is imposed. At that point apparently a low level of these proteins plays no obvious role.

      (2) It is interesting that DDC no longer responds to the damage signaling after 15 h of DSB-induced prolonged checkpoint arrest after two DNA double-strand breaks. Is this also applicable to other adaptation mutants? The results might improve the broad impact of the current conclusions. It is also possible that the transition from DDC to SPC depends on simply the changes in signaling or in part due to the molecular changes in the status of DNA breaks or its flanking regions. Indeed, the proposed model suggests that the spreading of H2A phosphorylation to centromeric regions induces SAC and thus mitotic arrest. The authors could measure H2A phosphorylation near the centromere using ChIP assays at various intervals post-DNA damage. It is particularly interesting if depletion of Ddc2 at 15 h post DNA damage does not alter the level of H2A phosphorylation at or near centromere.

      Our previous data have suggested that the involvement of the SAC in prolonging DSB-induced arrest involved post-translational modification of centromeric chromatin such as the Mec1- and Tel1-dependent phosphorylation of the histone H2A (Dotiwala). In budding yeast there is also a similar DSB-induced modification of histone H2B (Lee et al.). To ask if there is an intrinsic activation of the SAC if the regions around centromeres were modified by checkpoint kinase phosphorylation, we examined cell cycle progression in strains in which histone H2A or histone H2B was mutated to their putative phosphomimetic forms (H2A-S129E and H2B-T129E).  As shown in Figure S11, there was no effect on the growth rate of these strains, or of the double mutant, suggesting that cells did not experience a delay in entering mitosis because of these modifications. We note that although histone H2A-S129E is recognized by an antibody specific for the phosphorylation of histone H2A-S129, the mutation to S129E may not be fully phosphomimetic. 

      (3) It is puzzling why Rad9-AID or Rad24-AID are proficient for DDC establishment but cannot sustain permanent arrest in the two break cells. It appears Rad53 phosphorylation for DDC is weaker in cells expressing Rad9-AID or Rad24-AID according to Fig.2B and C even though their protein level before IAA treatment is still robust. This might also explain why the results of depleting Rad53 and Rad9 are very different. It also raises concern if the effect of Rad24 depletion on checkpoint maintenance is in part due to the weaker checkpoint establishment. It might be necessary to use the AID2 system to redo Rad24 depletion to exclude such a possibility.

      We believe that the AID mutants are very sensitive to the low level of IAA present in yeast.  The instability of the protein is entirely dependent on the TIR1 SCF factor, so the proteins themselves are not intrinsically defective; they are just subject to degradation.  Overexpressing Rad9 allowed us to evaluate its role at late time points. 

      (4) It is intriguing that the switch from DDC to SAC might take place at around 12 h when yeasts with a single unrepairable break ignore DDC and resume cell cycling (so-called "adaptation"). Since 4h and 15h are far apart and the transition point from DDC to SAC likely takes place between these two points, it will be very helpful to analyze and compare cell cycle exit after 24 h by treating IAA at multiple points between 4-15h.

      When we add IAA to Mad2-AID and Mad1-AID 4 h after DSB induction, cells remain arrested for up to 12 h after DSB induction. At 15 h cells begin to exit checkpoint arrest indicating that the handoff of checkpoint arrest must occur between 12 to 15 h after DSB induction. If we degraded DNA damage checkpoint proteins at any point before Mad2, Mad1, and Bub2 begin to contribute to checkpoint arrest, then arrested cells will likely adapt in a similar manner to when IAA was added 4 h after DSB induction.

      (5) Some of the Western blot quality is poor. For instance, in Figure 6C, Mad1-AID level after IAA addition is not compelling especially because the TIR level (the loading control) is also very low.

      In Figure 6C, while the relative levels of TIR1 are similar in the IAA treated and untreated samples, there is no detectable amount of Mad1-AID in the IAA treated samples indicating that Mad1-AID was successful degraded with the AID system.

      (6) Fig. 8 is complex. It might be helpful to define the different types of arrows in the figure. The legend also has a spelling error, Rad23 should be Rad24.

      We’ve defined what each arrow means in the legend and corrected the spelling error in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      Much of the manuscript states that two unrepairable DSBs lead to a long and severe G2/M arrest. Two main cytological approaches are used to make this statement: bud size and number on plates after micromanipulation (microcolony assay), and cell and nuclear morphology in liquid cultures. While the latter gives a clear pattern that can be assigned to a G2/M block as expected by DDC, i.e. metaphase-like mononucleated cells with large buds, the former can only tell whether cells eventually reach a second S phase (large budded cells on the plate can be in a proper G2/M arrest, but can also be in an anaphase block or even in the ensuing G1). The authors always performed the microcolony assay, but there are several cases where the much more informative budding/DAPI assay is missing. These include Dun1-aid and others, but more importantly chk1D and its combinations with DDC proteins. Incidentally, for the microcolony assay, it is more accurate to label the y-axis of the corresponding graphs (and in the figure legends and main text) with something like "large budded cells"; "G2/M arrested cells" is misleading.

      Figures have been updated to more accurately reflect what we are measuring.

      The results obtained with the Bfa1/Bub2 partner are intriguing. These two proteins form a complex whose canonical function is to prevent exit from mitosis until the spindle is properly aligned, acting in a distinct subpathway within the SAC that blocks MEN rather than anaphase onset. The data presented by the authors suggest that, on the one hand, both SAC subpathways work together to block the cell cycle. However, why does canonical SAC (Mad1/Mad2) inactivation not lead to a transition from G2/M (metaphase-like) arrested cells to anaphase-like arrest maintained by Bfa1-Bub2? Since Bfa1-Bub2 is a target of DDC, is it possible that DDC knockdown also inactivates this checkpoint, allowing adaptation? On the other hand, can the authors provide more data to confirm and strengthen their claim of a Bfa1-independent Bub2 role in prolonged arrest? Perhaps long-term protein localization and PTM changes. Bub2-independent roles for Bfa1 have been reported, but not vice versa, to the best of my knowledge.

      In the mitotic exit network Bfa1/Bub2 prime activation of the pathway by bringing Tem1 to spindle pole bodies. Phosphorylation of Bfa1 causes Tem1 to be released and phosphorylate Cdc5 to trigger exit by MEN. It has been shown that DNA damage, in a cdc13-1 ts mutant, phosphorylates Bfa1 in a Rad53 and Dun1 dependent manner. This phosphorylation of Bfa1 could release Tem1 and prime cells to exit checkpoint arrest when cells pass through anaphase. Looking at Tem1 localization to spindle pole bodies and interactions with Bfa1/Bub2 in response to DNA damage might give insight into why cells don’t experience an anaphase-like arrest when they are released by either deactivation of the DNA damage checkpoint or SAC.

      We have previously shown that a deletion of bub2 in a 1-DSB background shortens DSB-induced checkpoint arrest. Deletion of bfa1 in a 2-DSB background showed ~80-70% of cells stuck in a large-budded state as measured through an adaptation assay tracking the morphology of G1 cells on a YP-Gal plate and DAPI staining. Deletion or degradation of bfa1 might not release cells from arrest because the Mad2/Mad1 prevent cells from transitioning into anaphase. Our DAPI data for Bub2-AID shows an increase in cells with 2 DAPI signals (transition into anaphase) and small budded cells indicating that degradation of Bub2 is releasing cells into anaphase and allowing cells to complete mitosis.

      Further suggestions:

      It would be richer if authors could provide more than one experimental replicate in some panels (e.g., S1A,B; S4A; and S6B).

      S1C confirms that Rad9-AID and Rad24-AID will adapt by 24 h even with the point mutant TIR1(F74G) which has lower basal degradation than TIR1. S4A has been updated with additional experimental replicates. The 48 h timepoint after DSB induction was to show the importance of Mad2 even when Ddc2 is overexpressed.

      Figure 1: Rearrange figure panels when they are first mentioned in the text. For example, it makes more sense to have the plate adaptation assay as panel B for both 1-DSB and 2-DSB strains, budding plus DAPI as panel C, and Rad53 as panel D.

      These figures have been rearranged in the order that they are mentioned in the paper.

      Figure 5: Correct Ph-5-IAA in the Rad53 WBs (it should be 5-Ph-IAA).

      This has been corrected.

      Figure S2: The straight line under the "+IAA" text box is misleading. I think it should also cover the "-2" time point, right? Also, check the figure legend. Information is missing and does not correspond to the figure layout.

      This has been corrected.

      Figure S3: Perhaps "Cell cycle profile as determined by budding and DAPI staining" is a better and more accurate legend title.

      The legend title has been updated to “Cell cycle profile as determined by budding and DAPI staining in Ddc2-AID and Rad53-AID mutants ± IAA 4 h after galactose.”

      Figure S5: Detection of both Rad53 and Ddc2 in the same blot could lead to misinterpretation as hyperphosphorylated Rad53 appears to coincide with Ddc2 migration.

      Figure S5A-B are representative western blots where Rad53 was probed to show activation of the DNA damage checkpoint by Rad53 phosphorylation. When measuring the relative abundance of Ddc2 we did not probe all blots for Rad53.

      Table S1: Include the post-hoc test used for comparisons after ANOVA.

      A Sidak post-hoc test was used in PRISM for the one-way ANOVA test. PRISM listed the Sidak post-hoc test as the recommended test to correct for multiple comparisons. A column has been added to S. Table 1 to show which post-hoc test was used.

      Page 10, line 4: The putative additive effect of chk1 knockout with Dun1 depletion should also be compared to chk1 alone (in Figure 3A).

      We address the additive effect of chk1 knockout with Dun1-AID depletion in a later section on Page 11, line 6. Since we had not explored possible effects from downstream targets of Rad53 for prolonging checkpoint arrest when Rad53 was depleted, we did not mention the effect of the chk1 knockout on Dun1 depletion.

      Page 14, second paragraph, line 4: "Figure 6A-D", is it not?

      Figure S6A is measuring checkpoint arrest in a deletion of mad2 in a 2-DSB strain. Figure 6A-D shows how degradation of Mad2-AID and Mad1-AID after the handoff of arrest causes cells to exit the checkpoint in a Rad53 independent manner.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback and cognisense of our efforts. Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance of _Pf_MORC in maintaining chromatin structural integrity in the parasite and highlights this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional genomic assays that point to the relevance of the _Pf_MORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which _Pf_MORC is involved, we bring forth first-hand evidence of its role in heterochromatin binding, gene-regulation and its association with major TFs as well as chromatin remodeling and modifying enzymes. We however agree with the comment regarding the lack of direct effects of _Pf_MORC KD and have since provided additional evidence by performing ChIP-seq experiments against H3K9me3 and H3K9ac during KD. Our new results are presented in Fig. 5. We showed that the level of H3K9me3 decreased significantly during _Pf_MORC KD.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      Validation of the identified interacting partners is indeed critical and essential to understanding their role in directing MORC to its targets. Our protein pull down experiments have been done using several biological replicates. Several of the interacting partners have also been identified and published by other labs and collaborators. To confirm our results, we completed a direct comparison of our work with previous published work. Results have now been incorporated into the revised manuscript to confirm the identified interacting partners and the accuracy of the data we obtained in our experiment. Molecular validation of novel proteins identified in our protein pull down requires generation of tagged lines and may take a few more years but will be submitted for publication in a follow up manuscript.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and have performed additional experiments to delve deeper into the multifaceted roles of _Pf_MORC. We have performed additional ChIP-sequencing analysis on _Pf_MORC depleted conditions focusing on known heterochromatin and euchromatin markers H3K9me3 and H3K9ac respectively. We hope our new results presented in figure 5 will shed light on the more direct implications of _Pf_MORC on heterochromatin and gene silencing.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • Why does MORC, which was used in the pull-down, seem to be only minimally enriched in the volcano plot, while a series of proteins (marked in red) and AP2 (highlighted in green) are enriched with log2 fold changes exceeding 15?

      We apologize for the confusion. MORC was detected with the highest number of peptides (97 and 113) and spectra (1041 and 1177) confirming the efficiency of our pull-down. However, considering the relatively large size of the MORC protein (295kDa) and it weak detection in the control (5 and 7 peptides; 16 and 43 spectra), the Log2 FoldChange and Z-statistic after normalization are minimal compared to smaller proteins that were not identified in the control samples.

      Additionally, can you explain why these proteins appear to be enriched at the same fold? 

      We can postulate that these proteins form a complex with a ratio of 1:1. Two of these three proteins are described to interact with MORC in several publications, supporting a strong interaction between them.

      Variations in the interactome could result from the washing buffer's stringency.

      We agree that the IP conditions could affect the detection of the interactome as well as the parasite stage used. As indicated below, the overlap with previous publications and the presence of AP2 TFs and chromatin remodelers strongly support our results.

      It would be highly appropriate for the authors, similar to the co-submitted article (Maneesh Kumar Singh et al.), to present their mass spectrometry data in relation to previous purifications in Plasmodium (Bryant et al. 2020; Subudhi et al. 2023; Hillier et al. 2019) and also in Toxoplasma (Farhat et al. 2020). It would be good if authors could also put their results into perspective in light of the following pre-prints:

      We agree with the reviewer’s comment. In this revised manuscript, we compared our IP-MS data to previous published manuscripts. Key proteins including the AP2-P (PF3D7_1107800) and HDAC1 were indeed identified in several experiments validating our initial findings of the formation of large complexes with MORC. However, it’s important to highlight that the MORC protein was not used as the bait protein in previously published papers, and thus some discrepancies can be observed.

      Given the tendency of MORCs to form multiple complexes with AP2 factors, have you explored whether specific AP2s are conserved between Plasmodium and Toxoplasma, within the phylum?

      P. falciparum encodes for 27 putative AP2s, while T. gondii has over 60 AP2s, making direct comparison challenging. Some Plasmodium AP2s have multiple counterparts in T. gondii and typically conservation is limited to the AP2 binding domains. Attempts to identify sequence homology among AP2s and the regions of conservation have been performed (PMID: 30959972, PMID: 30959972, PMID: 16040597). Although this information would provide interesting insight, we believe exploring this topic at this time would diverge from our primary objectives. It would be more appropriate to address this in future studies.

      Could this conservation be identified either through phylogenetic means or by using tools such as AlphaFold, especially considering not just the AP2 domains but also any existing ACDC domains?

      Although this may reveal important information regarding the association between MORC proteins and AP2 domains, we believe investigating the conservation between AP2 across apicomplexan parasites may prove too challenging and is beyond the scope of this work.

      Most of the genes are depicted without their immediate surroundings (Fig. 2d and Fig S2c, d). For instance, the promoter region of AP2g is not shown (Fig. 2d). It is therefore very challenging to determine the presence or absence of MORC upstream or downstream; considering that this factor, which can create DNA loop protrusions, might bind at a distance from the genes in question.

      All gene coverage plots, including AP2-G, show 500 bp up- and downstream of the displayed gene. We have modified our figure legends to make sure that this information is provided.

      Upon examining Figure S3, it is evident that the authors have indicated a decline in PfMORC expression, represented as percentages over two unique time frames. The methodology behind this quantification remains ambiguous. It's essential for the authors to specify whether normalization was done using a loading control. As a benchmark, Singh et al. (2021) in their Figure 4 transparently used GAPDH as a loading control and included an untreated sample in their western blot analysis.

      We thank the Reviewer for bringing this to our attention. Our initial quantification was performed using ImageJ. To address the Reviewer’s comment, we have reperformed the experiment. Our quantitative analysis was performed through Bio-Rad ImageLab software using aldolase expression as a loading control (50% of the MORC loading). This information has now been incorporated into the supplementary figures (Figure S3).

      There's a striking observation that, despite significant degradation of PfMORC (as depicted in Figures S1 and S3), only the upper band in the western blot diminishes. This inconsistency needs addressing, as it can raise questions about the interpretation of the results.

      We agree with the reviewer's comment. We experienced some challenges upon performing a Western Blot on such a large protein (295kDa). Our initial attempts required long exposure that may have highlighted non-specific signals of smaller proteins. To address the reviewer’s comment, we have performed the experiment one more time and made necessary changes to our WB protocol. Our new result better reflects the expected down regulation of _Pf_MORC. These changes have been incorporated to our manuscript and Fig S3.

      Recommendations for improving the writing and presentation.

      MORC KD quantification and consistency with previous findings (Figure S3): When comparing their results with those from another study (Singh et al. 2021), it's critical to ensure that the experimental conditions, especially the methodology for KD and the quantification of protein levels, are similar. If not, a direct comparison might be misleading.

      We greatly appreciate the suggestions and have made efforts to redesign the MORC KD quantifications according to the reviewer’s recommendations.

      While the manuscript mentions the level of KD, it does not delve into the functional consequences of such a decrease in protein levels. It would be of interest to understand how this level of KD affects the parasite's biology, especially in the context of the paper's main findings.

      We have addressed this question by looking at the changes in chromatin structure in WT versus KD parasites upon atc removal. We have also validated this initial result by designing an additional ChIP-seq experiment against histone marks in WT versus KD parasites upon atc removal. Our findings showed a significant downregulation in H3K9me coverage in heterochromatin regions, specifically in genes associated with antigenic variation and invasion genes. These findings suggest that PfMORC regulates at least partially gene silencing and chromatin arrangements. The manuscript has been edited accordingly. 

      Concluding page 5, the authors present an interpretation of their findings that suggests a multi-faceted role of PfMORC in regulating stage-specific gene families, particularly the gametocyte-related genes and merozoite surface proteins. While the narrative they present is intriguing, several concerns arise:

      Over-reliance on correlation: The authors draw a direct line between the levels of PfMORC binding and the function of these genes in the parasite's life cycle. However, a mere correlation between PfMORC binding and stage-specific gene activity does not necessarily imply causation. They would need to provide experimental evidence showing that manipulation of PfMORC levels directly impacts these genes' expression.

      We agree with the reviewer's comment. We have however partially addressed this issue by comparing our ChIP-seq, RNA-seq and Hi-C experiments. We concluded that several of the transcriptional changes observed were due to an indirect effect of PfMORC KD and were most likely induced by a cell cycle arrest and partial collapse of the chromatin structure. The collapse of the heterochromatin structure was validated using our Hi-C experiment. To further address additional concerns the review’s had, we have included additional ChIP-seq experiments targeting histone marks to confirm our initial hypothesis. Result of this additional experiment has been incorporated in the revised version of the manuscript.

      Ambiguity surrounding "low levels" and "high levels": The terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. Without quantification or a clear benchmark, these descriptions remain vague.

      We agree with the reviewers that the terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. We have however quantified our change in DNA binding using normalized reads (RPKM). In trophozoite and schizont stages, most of the genes contain a mean of <0.5 RPKM normalized reads per nucleotide of Pf_MORC binding within their promoter region, whereas antigenic gene families such as _var and rifin contain ~1.5 and 0.5 normalized reads, respectively (Fig. 2b). Similar results are also obtained for the gametocyte-specific transcription factor AP2-G  that contains levels of Pf_MORC binding similar to what is observed in _var genes (Fig. 2c and S2c, d).

      Shift in Binding Sites: The observed minor switch in PfMORC binding sites from gene bodies to intergenic and promoter regions is mentioned, but without context on how these shifts impact gene expression or any comparative analysis with other proteins showing similar shifts. The claim that this shift implicates PfMORC as an "insulator" is a leap without direct evidence.

      We apologize for the confusion. We  have compared our ChIP-seq with RNA seq results at different time points of the cell cycle and demonstrated that the shift observed has an effect in gene expression. We have edit the manuscript to clarify these results.

      Overextension of PfMORC's Role: The authors suggest that PfMORC moves to the regulatory regions around the TSS to guide RNA Polymerase and transcription factors. This is a substantial claim and would require additional experiments to validate. Simply observing binding in a region is insufficient to assign a specific functional role, especially one as critical as guiding RNA Polymerase. Historically, the MORC family has been primarily linked with gene silencing across Apicomplexan, plants, and metazoans. On page 7, the authors noted a minimal overlap between the ChIP-seq and RNA-seq signals (Fig. 4e). They also acknowledged that the pronounced gene expression shifts at schizont stages result from a combination of direct and indirect impacts of PfMORC degradation, which could cause cell cycle arrest and potential heterochromatin disintegration, rather than just decreased PfMORC binding. Therefore, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We agree with the reviewer's comment and have edited the manuscript accordingly.  

      DISCUSSION:

      The authors concluded that "Using a combination of ChIP-seq, protein knock down, RNA-seq and Hi-C experiments, we have demonstrated that the MORC protein is essential for the tight regulation of gene expression through chromatin compaction, preventing access to gene promoters from TFs and the general transcriptional machinery in a stage specific manner."

      Again, the assertion that MORC protein is essential for tight regulation of gene expression, based purely on correlational data (e.g., ChIP-seq showing binding doesn't prove functionality), assumes causality which might not be fully substantiated. The phrase "preventing access to gene promoters from TFs and the general transcriptional machinery in a stage-specific manner" needs also validation. Asserting that MORC is essential for this function might oversimplify the process and overlook other critical contributors.

      We agree with the reviewer’s comments and the conclusion has since been edited accordingly.

      The discussion is quite poor. It would be pertinent to put MORC in perspective within the broader picture of regulatory mechanisms of chromatin state at telomeres and var genes. For instance, how do SIR2 and HDAC1 (associated with MORC) divide the task of deacetylation? Or the contribution of HP1 and other non-coding RNAs.

      We agree with the reviewer’s suggestion. However, in order to put MORC in perspective within a broader picture, we would need to measure changes in localization of several molecular components regulating heterochromatin in WT versus KD condition. This will require access to several molecular tools and specific antibodies that we do not currently have. We have addressed these issues in our discussion.  

      Minor corrections to the text and figures.

      Figure 1d: Could you provide the ID for each AP2 directly on the volcano plot? While some IDs are referenced in the manuscript, visual representation in the plot would facilitate a clearer understanding of their enrichment levels.

      ID for unknown AP2 proteins have been added on the volcano plot.

      I recommend presenting Figure S2b as a panel within a primary figure. This change would offer readers a more quantitative understanding of the distinct differences between developmental stages. Notably, there seems to be a limited number of genes in common when considering the total, and there is an apparent lack of enrichment in the ring stage.

      This has been done.

      The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. 

      We have improved the figure legends and add the number of biological replicates as well as the statistic used in each figure legend.

      Figure 1A: The protein diagram with its domains does not take scale into account.

      The figure has been modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) The study lacks a direct link between PfMORC's inferred function and the state of heterochromatin in the genome post-depletion.

      We agree with the reviewer's comment and have included additional ChIP-seq experiments to measure changes in histone marks in PfMORC depleted parasite line. We show a significant decrease in histone H3K9me3 marks in PfMORC KD condition.

      Conducting ChIP-seq on well-known heterochromatin markers such as H3K9me3, HP1, or H3K36me2/3 could shed light on the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      With no access to an anti-HP1 antibody with reasonable affinity, we have not been able to study the impact of MORC KD on HP1 but have successfully observed the impact on H3K9me3 marks. These results have been added to the revised manuscript in (Fig. 5).

      (2) The authors should conduct a more comprehensive analysis of PfMORC's genomic localization, comparing it to ApiAP2 binding (interacting proteins) and histone modifications. This would provide valuable insights.

      We have performed a more comprehensive genome wide analysis of MORC binding through ChIP-seq on WT and MORC-KD conditions. Our results show that Pf_MORC localizes to heterochromatin with significant overlap with H3K9-trimethylation (H3K9me3) marks, at or near _var gene regions. When downregulated, level of H3K9me3 was detected at a lower level, validating a possible role of _Pf_MORC in gene repression. Regarding the comparison with AP2 binding, our proteomics datasets have shown extensive MORC binding with several AP2 proteins.

      (3) RNA-seq data reveals that only a few genes are affected after 24 hours of PfMORC depletion, with an equivalent number of up-regulated and down-regulated genes. The reasons behind down-regulation resulting from a heterochromatin marker depletion are not clearly established.

      We agree with the reviewer’s comment. At this stage (24 hours), _Pf_MORC depletion is limited and the effects at the transcriptional level are quite restricted. Furthermore, it is highly probable that down-regulated genes are most likely due to an indirect effect of a cell cycle arrest. We have edited the manuscript to address this comment. 

      The relationship between this data and the partial depletion of PfMORC needs further discussion.

      We agree with the reviewers and have improved our discussion in the revised version of the manuscript.

      (4) The authors did not compare their ChIP-seq data with the genes found downregulated in the RNA-seq data. Examining the correlation between these datasets would enhance the study.

      We apologize for the confusion. We have compared ChIP-seq and RNA-seq data and identified a very limited number of overlapping genes indicating that most of the changes observed in gene expression are in fact most likely indirect due to a cell cycle arrest and a collapse of the chromatin. We have edited the manuscript to clarify this issue.

      (5) The discussion section is relatively concise and does not fully address the complexity of the data, warranting further exploration.

      We have improved the discussion section in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since a downregulation of Notch signaling was proposed to be important for the production of a specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.

      Consistent with their hypothesis, the authors show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A, including a reduced response to wasp parasitism, and their epistatic analyses indicate that Pkc53E acts upstream of Su(H).

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes which back the main conclusions. Notably Su(H)S269A mutation or Pkc53E deficiency strongly reduces lamellocyte production and the epistatic data are convincing.

      Weaknesses

      The phenotypic analysis of larval blood cells remains rather superficial. Looking at melanized cells is a crude surrogate to quantify crystal cell numbers as it is biased toward sessile cells (with specific location) and does not bring information concerning the percentage of blood cells differentiated along this lineage.

      In Su(H)S269A knock-in or Pkc53E zygotic mutants, the increase in crystal cells in uninfected conditions and the decreased capacity to induce lamellocytes following infection could have many origins which are not investigated. For instance, premature blood cell differentiation could promote crystal cell differentiation and reduce the pool of lamellocytes progenitors. These mutations could also affect the development and function of the posterior signaling center in the lymph gland, which plays a key role in lamellocyte induction.

      Similarly, the mild decrease on resistance to wasp infestation (Fig. 2A) could reflect a constitutive reduction in blood cell numbers in Su(H)S269A larvae rather than a defective down-regulation of Notch activity.

      We fully agree with the reviewer that sessile crystal cells counts are a coarse approach to capture hemocytes. However, they allowed the screening of numerous genotypes in the course of our kinase candidate screen. We recorded the hemocyte numbers in the various genetic backgrounds and with regard to wasp infestation. There was no significant difference between Su(H)S269A and Su(H)gwt control, independent of infection. This is in agreement with earlier observations of unchanged plasmatocyte numbers in N or Su(H) mutants compared to the wild type (Duvic et al., 2002). We noted, however, a small drop in hemocyte numbers in Su(H)S269D and a strong one in Pkc53ED28 mutants in both conditions relative to control. Presumably, Pkc53E has a more general role in blood cell development, which we have not further analysed. The results were included in new Figure 1_S1 and Figure 9_S1 supplements. Based on the link between hemocyte numbers and wasp resistance (e.g. McGonigle et al., 2017), we cannot exclude that the lowered resistance of Pkc53ED28 mutants regarding wasp attacks is partly due to reduced hemocyte numbers, albeit we did not see significant differences between either Su(H)S269A, nor Pkc53ED28 nor the double mutant. We have included this notion in the text.

      Lamellocytes arise in response to external challenges like parasitoid wasp infestation by trans-differentiation from larval plasmatocytes, and by maturation of lamellocyte precursors in the lymph gland, yet barely in the Su(H)S269A and Pkc53ED28 mutants.

      We find it hard to envisage, however, that a premature differentiation of plasmatocytes into crystal cells in our case could deplete the pool of lamellocyte progenitors in the hemolymph. (Is there a precedent?). Crystal cells make up about 5% of the hemocyte pool; they are increased max. 2 fold in the Su(H)S269A and Pkc53E mutants. Even if these extra crystal cells (now  ̴10%) had arisen by premature differentiation, there should be still enough plasmatocytes (̴ 80%) remaining with a potential to further divide and transdifferentiate into lamellocytes.

      Indeed, we cannot exclude an effect of the Su(H)S269A mutant on the development and function of the posterior signaling center of the lymph gland. We noted, however, a slight but significant enlargement of the PS in the Su(H)S269A mutant, that to our understanding cannot explain the reduced lamellocyte numbers.

      Whereas the authors also present targeted-knock down/inhibition of Pkc53E suggesting that this enzyme is required in blood cells to control crystal cell fate (Fig. 6), it is somehow misleading to use lz-GAL4 as a driver in the lymph gland and hml-GAL4 in circulating hemocytes as these two drivers do not target the same blood cell populations/steps in the crystal cell development process.

      We fully agree with the reviewer that the two driver lines target different blood cell populations/ steps in hematopoiesis. The hml-Gal4 driver is regarded pan-hemocyte, common to both plasmatocytes and pre-crystal cells (e.g. Tattikota et al., 2020). It has been reported to drive specifically within differentiated hemocytes prior to or at the stage of crystal cells commitment (Mukherjee et al., 2011). Hence, hml-Gal4 appeared suitable to hit sessile and circulating hemocytes prior to final differentiation into crystal cells or lamellocytes, respectively.

      In the lymph gland, however, hml is expressed within the cortical zone, where it appears specific to the plasmatocytes lineage, and not present in the crystal cell precursors (Blanco-Obregon et al., 2020). In contrast, lz-Gal4 is specific to the differentiating crystal cells in both lineages, i.e. in circulating and sessile hemocytes and in the lymph gland. Hence, we choose lz-Gal4 instead of hml-Gal4 at the risk of driving markedly later in the course of crystal cell differentiation. We included the reasoning in the text. Overall, we feel that this choice does not limit our conclusions.

      In addition, the authors do not present evidence that Pkc55E function (and Su(H) phosphorylation) is required specifically in blood cells to promote lamellocyte production in response to infestation.

      We have tried to address this interesting question by several means. Firstly, we show that Pkc53E is indeed expressed in the various cell types of larval hemocytes, shown in a new Figure 8 and Figure 8_S1 supplement. I.e., there is the potential of Pkc53E to promote lamellocyte formation. Moreover, RNAi-mediated downregulation of Pkc53E within hemocytes affected crystal cell formation similar to the Pkc53ED28 mutant, in agreement with a specific requirement within blood cells (Figure 6). Finally, we show a major drop in Notch target gene transcription (NRE-GFP) in response to wasp infestation within isolated hemocytes from Su(H)gwt in contrast to Su(H)S269A larvae (see new Figure 1 G). These data show that Su(H)-mediated Notch activity must be downregulated in hemocytes prior to lamellocyte formation in agreement with our hypothesis.

      Finally, the conclusion that Pkc53E is (directly) responsible for Su(H) phosophorylation needs to be strengthened. Most importantly, the authors do not demonstrate that Pkc53E is required for Su(H) phosphorylation in vivo (i.e. that Su(H) is not phosphorylated in the absence of Pkc53E following infestation).

      We would very much like to show respective results. Unfortunately, the low affinity of our pS269 antibody does not allow any in situ or in vivo experiments. We very much hope to obtain a more specific phosphoS269-Su(H) antibody allowing us further in situ studies, and show, for example co-localization with Pkc53E.

      In addition, the in vitro kinase assays with bacterially purified Pkc53E (in the presence of PMA or using an activated variant of Pkc53E) only reveal a weak activity on a Su(H) peptide encompassing S269 (Fig. 4).

      The reviewer correctly notes the poor activity of our purified Pkc53EEDDD kinase. This low activity also holds true for the standard peptide (PS), which in fact is even less well accepted than the Swt substrate. Indeed, the commercially available PKCα is a magnitude more active. Whether this reflects the poor quality of our isolated protein compared to the commercial PKCα, or whether it reflects a true biochemical property of Pkc53E remains to be shown in the future. We noted this observation in the manuscript.

      Moreover, while the authors show a coIP between an overexpressed Pkc53E and endogenous Su(H) (Fig. 7) (in the absence of infestation), it has recently been reported that Pkc53E is a cytoplasmic protein in the eye (Shieh et al. 2023), calling for a direct assessment of Pkc53E expression and localization in larval blood cells under normal conditions and upon infestation.

      Indeed, it is interesting that a Pkc53E-GFP fusion protein is cytoplasmic in the eye. The construct reported by Shieh et al. however, i.e. the B-isoform, is preferentially expressed in photoreceptors, where it regulates the de-polymerization of the actin cytoskeleton.

      Due to the eye-specific expression, we unfortunately cannot use the Pkc53E-B-GFP construct to test for Pkc53E’s distribution in other tissues.

      As this construct is of little use for studying hematopoiesis, we have instead used Pck53E-GFP (BL59413) derived from a protein trap: again, GFP is primarily seen in the cytoplasm of hemocytes, including lamellocytes of infected larvae. However, in a small number of hemocytes, GFP appears to be also nuclear (Fig. 8A), leaving the possibility that activated Pkc53E may localize to the nucleus, eventually phosphorylating Su(H) and downregulating Notch activity. As Su(H) enters the nucleus piggy-back with NICD, however, phosphorylation may as well occur at the membrane or within the cytoplasm. We note, however, that these hypotheses require a much more detailed analysis.

      Furthermore, the effect of the PKCa agonist PMA on Su(H)-induced reporter gene expression in cell culture and crystal cell number in vivo is somehow consistent with the authors hypothesis, but some controls are missing (notably western blots to show that PMA/Staurosporine treatment does not affect Su(H)-VP16 level) and it is unclear why STAU treatment alone promotes Su(H)-VP16 activity (in their previous reports, the authors found no difference between Su(H)S269A-VP16 and Su(H)-VP16) or why PMA treatment still has a strong impact on crystal cell number in Su(H)S269A larvae.

      We have added a Western blot showing that the treatment does not affect Su(H)-VP16 expression levels (Figure 5_supplement 1). As STAU is a general kinase inhibitor, it may obviate any inhibitory phosphorylation of Su(H)-VP16 in the HeLa cells, e.g. that by Akt1, CAMK2D or S6K which pilot T271, phosphorylation of which is expected to affect the DNA-binding of Su(H) as well (Figure 3_supplement 2). Moreover, in the previous report, we used different constructs with regard to the promoter, and we used RBPJ instead of Su(H), which may explain some of the discrepancies. As PMA is not specific to just Pkc53E, the altered crystal cell numbers may result from the influence on other kinases involved in blood cell homeostasis, as predicted by our genetic screen (Figure 3_supplement 1).

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide a more elaborate examination of larval blood cell types and blood cell counts under normal conditions and following infestation in the different zygotic mutants as well as upon Pkc53 knock-down. A thorough examination of PSC integrity should be performed and the maintenance of core blood cell progenitors examined. The authors should also clarify when after infestation the LG and larval bleeds are analyzed.

      - a more elaborate examination of larval blood cell types:

      - examination of larval blood cell counts under normal conditions: hemocyte # in gwt, SA, SD, & Pkc

      - examination of larval blood cell counts after infestation: hemocyte # in gwt, SA, SD, & Pkc

      - thorough examination  of PSC integrity: in gwt, SA, SD, & Pkc

      - thorough examination of blood cell progenitors: in gwt, SA, SD, & Pkc

      - clarify timing

      Hemocyte numbers of the various genotypes and conditions were recorded and are presented in Figure 1_S1 and Figure 9_S1. Timing was elaborated in the text and the Methods section.

      (2) The authors should clarify why they use lz-GAL4 or hml-GAL4 and what we can infer from using these different drivers.

      See above. The reasoning was included in the text.

      (3) The percentage of hatching of Su(H)S269A and Su(H)gwt flies in the absence of infestation should also be scored; a small decrease in Su(H)S269A viability might explain the observed differences in survival to wasp infestation. Absolute blood cell numbers (in the absence of infestation) have also been correlated with survival to infection and should be checked.

      Percentage of the emerging flies and hemocyte numbers in the absence of infestation were recorded and included in Figure 2, Figure 1_S1, Figure 9_S1.

      (4) Whereas the impact of Su(H)S269A or Pkc53E mutation on lamellocytes production is clear, there is still a substantial reduction in crystal cell production following infestation. So I wouldn't conclude that the Su(H) larvae are "unable" to detect this immune challenge or respond to it (line 116).

      Thank you for the hint, we corrected the text.

      (5) The expression and localization of Pkc53E in larval blood cells should be investigated, for instance using the Pkc53E-GFP line recently published by Shieh et al. (or at least at the RNA level).

      Firstly, we confirmed expression of Pkc53E in hemocytes by RT-PCR (Figure 8_S1 supplement). Secondly, expression of Pkc53E-GFP was monitored in hemocytes (Figure 8). To this end, we used the protein trap (BL59413), since the one published by Shieh et al., 2023 is restricted to photoreceptors.

      (6) It would be interesting to test the anti-pS269 antibody in immunostaining (using Su(H)S269A as negative control).

      Unfortunately, the pS269 antiserum does not work in situ at all.

      (7) The authors must perform a western blot with anti-pS269 in Pkc53e mutant to show that Su(H) is not phosphorylated anymore after wasp infestation.

      The blot gives a negative result.

      (8) It is surprising that no signal is seen in the absence of infestation with anti-pS269: the fact that Su(H)S269A have more crystal cells suggest that there is a constitutive level of phosphorylation of Su(H).

      We fully agree: In the ideal world, we would expect a low level of S269 phosphorylation in the wild type as well. However, given the lousy specificity of our antibody, we were happy to see phospho-Su(H) in infected larvae. We are currently working hard to get a better antibody. 

      (9) The authors should check Su(H)-VP16 levels and phosphorylation status after PMA and/or staurosporine treatment. Some clarifications are also needed to explain the impact of PMA in Su(H)S269 larvae (this clearly suggests that PKC has other substrates implicated in crystal cell development).

      Su(H)-VP16 expression levels were monitored by Western blot and were not altered conspicuously (Figure 5_1 supplement). Presumably, Pkc53E is not the only kinase involved in Su(H) phosphorylation or the transduction of stress signals. Moreover, PMA may have a more general effect on larval development and hematopoiesis affecting both genotypes. We included this reasoning in the text.

      (10) Concerning the redaction, the authors forgot to mention and discuss the work of Cattenoz et al. (EMBO J 2020). The presentation of the screen for kinase candidates could be streamlined and better illustrated (notably supplement table 4, which would be easier to grasp as a figure/graph). The discussion could be shortened (notably the part on T cells), and I don't really understand lines 374-376 (why is it consistent?).

      We are sorry for omitting Cattenoz et al. 2020, which we have now included. We fully agree that this paper is of utmost importance to our work. We streamlined the screen and included a new figure in addition to table 4 summarizing the results graphically (Figure 3_S1 supplement). We cut on the T cell part and omitted the strange lines.

      Reviewer #2 (Public Review):

      Summary:

      The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths:

      I find the study interesting and relevant especially because of the following:

      (1) The identification of Pkc53E in phosphorylation of Su(H) is very interesting.

      (2) The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:

      (1) Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.

      While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      The role of Notch in Drosophila hematopoiesis, notably during crystal cell development in both hematopoietic compartments is well established; likewise the role of Su(H) as integral signal transducer in this context (e.g. Duvic et al., 2002). Not only promotes Notch activity crystal cell fate by upregulating target genes, at the same time it prevents adopting the alternative plasmatocyte fate (e.g. Terriente-Felix et al., 2013). We could confirm the downregulation of Notch target gene expression in response to wasp infestation by qRT-PCR, which was discovered earlier by Small et al. (2014). This is clearly in favor of a repression of Notch activity rather than a relief of inhibition by Su(H). A ligand-independent activation of Notch signaling has been uncovered in the context of crystal cell maintenance in the lymph gland involving Sima/Hif-α, including Su(H) as transcriptional mediator (Mukherjee et al., 2011). However, we are unaware of a respective Su(H) activity independent of Notch.

      Certainly, Su(H) acts independently of Notch in terms of gene repression. Here, Su(H) forms a repressor complex together with H and co-repressors Groucho and CtBP to silence Notch target genes. Accordingly, loss of Su(H) or H may induce the upregulation of respective gene expression independent of Notch activity. This has been demonstrated, for example, during wing and heart development (Klein et al., 2000; Kölzer, Klein, 2006; Panta et al., 2020). Moreover, during axis formation of the early embryo, global repression is brought about by Su(H) and relieved by activated Notch (Koromila, Stathopolous, 2019). In all these instances, Su(H) is thought to act as a molecular switch, and the activation of Notch causes a strong expression of the respective genes. Likewise, the loss of DNA-binding resulting from the phosphorylation of Su(H) allows the upregulation of repressed Notch target genes in wing imaginal discs, e.g. dpn, as we have demonstrated before with overexpression and clonal analyses (Nagel et al. 2017; Frankenreiter et al., 2021). However, H does not contribute to crystal cell homeostasis, i.e. de-repression of Notch target genes does not appear to be a major driver in this context, asking for additional mechanisms to downregulate Notch activity. Our work provides evidence that these inhibitory mechanisms involves the phosphorylation of Su(H) by Pkc53E. Formally, we cannot exclude alternative mechanisms. Hence, we have tried to avoid the direct link between Su(H) phosphorylation and the inhibition of Notch activity throughout the text, including the title. Moreover, we have discussed the possible consequences of Su(H) lack of DNA binding, interfering either with the activation of Notch target genes or abrogating their repression.

      In addition, we have performed new experiments addressing the epistasis between Notch and Su(H) during crystal cell formation (Figure 1_supplement 1). To this end, we knocked down Notch activity in hemocytes by RNAi (hml::N-RNAi) in the Su(H)gwt and Su(H)S269A background, respectively. Indeed, Notch downregulation strongly impairs crystal cell development independent of the genetic background as expected if Notch were epistatic to Su(H). We attribute the slightly elevated crystal cell numbers observed in the Su(H)S269A background to the increase in the embryonic precursors (see Fig. 4; Frankenreiter et al. 2021). Of note, the Notch gain of function allele Ncos479 also displayed a likewise increase in embryonic crystal cell precursors as well as in crystal cells within the lymph gland (Frankenreiter et al. 2021).

      (2) Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed:

      First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response.

      Indeed, it would be extremely nice to follow the entire processes in every detail, ideally at the cellular level. The challenge, however, is quantities. The mRNA isolated from hemocytes could be barely quantified, although the subsequent ct-values were ok. We quantified NRE-GFP expression, introduced into Su(H)gwt and Su(H)S269A, as well as atilla expression. We were able to generate data for two time slots, 0-6 h and 24-30 h post infection. The data are provided in the extended Figure 1G, and show a strong drop of NRE-GFP in the infected Su(H)gwt control compared to the uninfected animals, whereas expression in Su(H)S269A plateaus at around 60%-70% of the infected Su(H)gwt control. Atilla expression jumps up in the control, but stays low in Su(H)S269A hemocytes.

      Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed.

      We really would love to do these experiments. Unfortunately, our pS269 antibody is rather lousy. It does not allow to detect Su(H) protein in tissue or cells, nor does it work on protein extracts in Westerns or for IP. Hence, we have no way so far to demonstrate cell or tissue specificity of Su(H) phosphorylation. So far, we were lucky to detect mCherry-tagged Su(H) proteins pulled down in rather large amounts with the highly specific nano-bodies. We have tried very hard to repeat the experiment with hemolymph and lymph glands only, but we have failed so far. Hence, we have to state that our antibody is neither suitable for in vivo analyses, nor for a detection of phospho-Su(H) at lower levels.

      The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed.

      We fully agree and hope to address these issues in the future by improving our tools.

      Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      We do not fully agree with the reviewer. Although we also think that the interaction between Pkc53E and Su(H) might occur more frequently upon infection, we propose that this is a transient process occurring in several but not all hemocytes at a given time. Moreover, in the described experiment, Pkc53E-HA was expressed in hemocytes via the UAS/Gal4 system. We cannot exclude that this approach causes an overexpression. Hence, we would not expect considerable differences between unchallenged and infested animals.

      (3) In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      We agree with the Reviewer that the rather general influence of PMA on PKCs might present a systemic stress to the animal. For example, we observed a slight drop of crystal cell numbers also in Su(H)S269A, suggesting other kinases apart from Pkc53E were affected that are involved in crystal cell homeostasis. We have included this notion in the text. To provide more conclusive evidence we also fed Staurosporine to the larvae which reversed the PMA effect. In addition, we assayed the expression of NRE-GFP in hemocytes of infected animals by qRT-PCR, and observed a strong drop in the infected versus uninfected control but less so in Su(H)S269A. The new data are provided in extended Figures 1G and 5B.

      (4) In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      Total hemocyte numbers of the various genotypes, i.e. control, Su(H)S269A, Su(H)S269D, and Pkc53ED28 were included before and after wasp infestation in supplemental Figures 1_S1 and 9_S1. 

      (5) Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      The analysis of the full process is an ongoing project. We propose that ROS is produced upon the wasps’ sting, which is to trigger the subsequent cascade of events. These have to end with activation of Pkc53E in the presumptive pre-lamellocyte pool of both lineages, i.e. in plasmatocyte of the hemolymph, presumably in the sessile compartment (Tattikotta et al., 2021) and at the same time in the lymph gland cortex harboring the LM precursors (Blanco-Obregon et al., 2020). One of the known upstream kinases, Pdk1 has a similar impact on crystal cell development as Pkc53E, making its involvement likely. Moreover, we think that other PKCs influence the process as well.

      Without a good read out, e.g. a functional pSu(H) antiserum working in situ or a Pkc-activity reporter, it will be quite difficult to follow up this question. However, we already know that Pkc53E is expressed in hemocytes of all types independent of wasp infestation, in agreement with a role during lamellocyte differentiation. We hope to unravel the process in more of it in the future.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

      Reviewer #3 (Public Review):

      Diechsel et al. provide important and valuable insights into how Notch signalling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favour lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodelling hematopoiesis upon wasp challenge. The data presented supports the overall conclusions made by the authors. There are a few points below that need to be addressed by the authors to strengthen the conclusions:

      (1) The authors should check melanized crystal cells in Su(H)gwt and Su(H)S269A in presence of PMA and Staurosporine?

      Thank you for the suggestion. We included the results of PMA + Staurosporine feeding into an extended Fig. 5B; they match those from the HeLa cells. Unfortunately, Staurosporine alone was lethal for the larvae at various concentrations, presumably owing to the overarching inhibition of kinase activity. This global effect also explains the high crystal cell numbers in the control fed with PMA + STAU compared to the untreated animals, as the downregulation of many kinases results in higher crystal cell numbers, a fact uncovered in our genetic screen.

      (2) Data for number of dead pupae, flies eclosed, wasps emerged post infestation should be monitored for the following genotypes and should be included:

      Pkc53EΔ28_, Su(H)S269A,_ Pkc53EΔ28 Su(H)S269A, Su(H)S269D, Su(H)S269D Pkc53EΔ28

      We extended the data with and without infection. The respective data are shown in a new Fig. 9 and an extended Fig. 2,  except for the Su(H)S269D allele. Su(H)S269D is larval lethal, i.e. dies too early for wasp development, and hence could not be included in the assay. Overall, Pkc53EΔ28 matched Su(H)S269A_._

      (3) The exact molecular trigger for activation of Pkc53E upon wasp infestation is not clear.

      Indeed, and we would love to know! Perhaps, the generation of Ca2+ by the wasp’s breach of the larval cuticle results in Pkc53E activation. The generation of ROS could be involved as well. At this point, we can only speculate. We hope to be able in the future to obtain direct experimental evidence for the one or the other hypothesis.

      (4) The authors should check if activating ROS alone or induction of Calcium pulses/DUOX activation can mimic this condition and can trigger activation of Pkc53E and thereby cause phosphorylation of Su(H) at S269

      The reviewer’s suggestions open up a new field of investigations, and are hence beyond of the scope of this article. However, we want to pursue the research in this direction, albeit we realize that counting crystal cells is too coarse but to give a first impression, and that lamellocytes may form already by breaching the larval cuticle. A major challenge shall be direct measurements of Pkc53E activation. To date, we have no tools for this, but ideally, we would like to have a direct, biochemical read out. Although we have been unsuccessful in the past, we want to develop a strong and specific phospho-S269 antibody that is also working in situ. Alternatively, we think of developing a PS-phosphorylation reporter, to allow reasonably addressing these questions.

      (5) Does Pkc53E get activated during sterile inflammation?

      We are in the process of addressing this issue, however, feel that his topic is beyond the scope of this paper. Our preliminary experiments, however, support the notion of a phospho-dependent regulation of Su(H) also in this context.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide a graphical representation of major phenotypes that form the basis of their investigation and conclusions but have not supplemented the quantitation with images that represent these phenotypes. The authors need to include the following data to strengthen their conclusions:

      (1) The authors should include representative images for each of the genotypes/conditions (in presence and absence of wasp infestation) based on which corresponding plots have been made in Figure 1. Please include this for both circulating lamellocytes in the hemolymph and in the lymph glands since this is one of the main figures presenting the key findings.

      The data have been included in Figure 1-S2 supplement.

      (2) Please include representative images of LG with Hnt staining and corresponding images for melanization for each of the genotypes used in the plots in Figure 6A and B.

      The data have been included in Figure 6-S2 supplement.

      (3) Representative images for each of the genotypes in Figure 7A & B should be included (circulating crystal cells and lymph gland crystal cell numbers).

      Representative images for each of the genotypes for Fig. 7A have been included in Figure 7-S1 and for the old Fig. 7B in Figure 9-S2 supplement, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

      Main changes:

      •   We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

      •   We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

      •   The text of the section Abstraction as “expiring” fitness advantage has been modified.

      •   We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

      All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

      •   The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

      •   The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

      •   It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

      Public reviews:

      Reviewer 1:

      Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

      We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

      (1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 \= 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 \= 5) show more clearly the behaviour in this regime.

      We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 \= 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

      But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

      In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

      We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

      (2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

      We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

      (3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

      We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

      In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

      Reviewer 2:

      Summary

      This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

      Strengths

      This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

      This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

      The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

      We thank the reviewer for their summary, assessement, and constructive critique.

      (1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

      We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

      (2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

      The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

      Reviewer 3:

      Summary

      In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

      Strengths

      The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      We thank the reviewer for their positive remarks and constructive criticism below.

      Weaknesses

      The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

      As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

      Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

      The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

      Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

      Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

      At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

      In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

      In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

      We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

      Recommendations for the authors:

      Reviewer 1:

      (4) Line 74: what does fitness mean?

      Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

      (5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

      This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

      A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

      (6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

      This was indeed a typo. It is now fixed.

      (7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

      The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

      In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

      (8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

      Fixed.

      (9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

      Fixed.

      (10) Line 477: typo “there will a variety of mutations”.

      Fixed.

      Reviewer 2:

      Should 𝛼 be 𝜈 in Figure 4 legends?

      Thank you very much for spotting this error. We fixed it.

      Equations 4-5 could be further simplified.

      We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

      The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

      We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

      In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

      Fixed.

      For the supplement section B, you might want a more descriptive title than “other”.

      We renamed this section to Expiring fitness model and random walk.

      Reviewer 3:

      To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

      In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

      Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

      An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

      In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

      A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

      In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

      In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

      Minor comments:

      There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

      Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

      We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 \= 1/ 𝑀 case, which we hope will linearize the writing.

      When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

      Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

      In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

      Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

      I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

      We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

      In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

      We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 \= 0.03, which we added in the caption of Figure 4.

      When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

      A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

      Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

      For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

      It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

      We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

      Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

      We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations For The Authors:

      Reviewer #1:

      ●      It might help the reader if you make it explicit that mDES allows you to create an approximate amalgam of different kinds of experiences by assuming that, across individuals, there is a general consensus of experiences at particular points in the movie. Whether this assumption is an accurate reflection of the way in which each individual's brain is an important, testable prediction that could be discussed/examined in different projects. For instance, in other projects there are clear idiosyncratic responses to the same naturalistic stimuli: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8064646/.

      Thank you, this is an excellent point. We have included this article in our revision and expanded on the introduction to emphasize how this study relates to our work. Additionally, we have included an additional figure that helps illustrate how mDES can be used to evaluate the idiosyncrasy for each respective thought component to visually display the variance across moments in the film:

      Page 6-7 [137-148] In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [8]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [22, 32, 33] and in daily life [34, 35], and is sensitive to accompanying changes in brain activity [24, 36]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [24, 32-41]. Each question describes a different feature of experience such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See methods for a full list of questions used in the current study).

      ●      A cartoon describing the mDES technique could be helpful for uninitiated readers.

      Thank you for your suggestion, we have added an additional figure (Figure 3) that illustrates the process of mDES in the laboratory during this experiment, clarifying that participants answer mDES items using a slider to indicate their score (rather than expressing it verbally).

      ●      Did the authors check for any measures of reliability across mDES estimates other than split-half reliability? For instance, the authors could demonstrate construct validity by showing that engagement with certain features of the thought-sampling space aligned with specific points in the movies. If so, the start of the Results section would be a great place to demonstrate the reliability of the approach. For instance, did any two participants sample the same 15-second window of time in a particular stimulus? If so, you could compare their experience samples to determine whether the method was extensible across subjects.

      This is a great point, thank you very much for highlighting this. We have eight individuals at each time point in our analysis, which is probably not enough to calculate meaningful reliability measures. However, we have added a time series analysis of experience in each clip to our revision (Figure 3). In these time plots, it is possible to see clear moments in the film in which scores do not straddle 0 (using 95% CI), and often, these persist across successive moments (Figure 3; see time-series plot four for the clearest example).  When the confidence intervals of a sampling epoch do not overlap with zero, this suggests a high degree of agreement in thought content across participants. At the same time, our analysis shows that individual differences do exist since the relative presence of each component for each participant was linked to objective measures of movie watching (in this case, comprehension). In this revision we have specifically addressed this question by conducting ANOVAs to determine how scores on each component across the clip (See also supplementary table 11). This additional analysis shows that mDES effectively captures shared aspects of movie-watching and is also sensitive to individual variation (since it can describe individual differences).

      Page 15 [304-323]: Next, we examined how each pattern of thought changes across each movie clip. For this analysis, we conducted separate ANOVA for each film clip for the four components (see Table 1 and Figure 3). Clear dynamic changes were observed in several components for different films. We analyzed these data using an Analysis of Variance (ANOVA) in which the time in each clip were explanatory variables of interest. This identified significant change in “Episodic Social Cognition” scores across Little Miss Sunshine, F(1, 712) = 10.80, p = .001, , η2 = .03, and Citizenfour, F(1, 712) = 5.23, p = .023, , η2 = .02. There were also significant change in “Verbal Detail” scores across Little Miss Sunshine, F(1, 712) = 31.79, p <.001, η2 = .09. Lastly, there were significant changes in “Sensory Engagement” scores for both Citizenfour, F(1, 712) = 6.22, p = .013, η2 = .02, and 500 Days of Summer, F(1, 706) = 80.41, p <.001, η2 = .18. These time series are plotted in Figure 3 and highlight how mDES can capture the dynamics of different types of experience across the three movie clips. Moreover, in several of these time series plots, it is clear that thought patterns reported extend beyond adjacent time periods (e.g. scores above zero between time periods 150 to 400 for Sensory Engagement in 500 days of Summer and for time periods between 175 and 225 for Verbal Detail in Little Miss Sunshine). It is important to note that no participant completed experience sampling reports during adjacent sampling points (see Supplementary Figure 7), so the length of these intervals indicates agreement in how specific scenes within a film were experienced and conserved across different individuals. Notably, the component with the least evidence for temporal dynamics was “Intrusive Distraction.”

      ●      P10: "Generation of the thought-space" - how stable are these word clouds to individual subjects? If there are subject-specific differences, are there ways to account for this with some form of normalization?

      Thank you for bringing up this point. Our current goal was to show how the average experience of one group of participants relates to the brain activity of a second group. In this regard it is important to seek the patterns of similarity across individuals in how they experience the film. However, as is normal in our studies using mDES, we can also use the variation from the mean to predict other cognitive measures and, in this way, account for the variability that individuals have in their movie-watching experience. In other words, the word clouds reflect the mean of a particular dimension, so when an individual score is close to 0, their thought content does not align with this dimension -- however, deviating scores, positive or negative, indicating that this dimension provides meaningful information about the individual's experience. Evidence of the meaningful nature of this variation can be seen in the links between the reported thoughts and the individuals’ comprehension (e.g. individuals whose thoughts do not contain strong evidence of “Intrusive Distraction”, or in other words, a negative score, tended to do better on comprehension tests of information in the movies they watched).

      ●      P11: "Variation in thought patterns" - can the authors use a null model here to demonstrate that the associations they've observed would occur above chance levels (e.g., for a comparison of time series with similar temporal autocorrelation but non-preserved semantic structure)? Further, were there any pre-defined hypotheses over whether any of the three different movies would engage any of the 4 observed dimensions?

      This is a great point. We chose to sample from three distinctly different films to help us understand if mDES was sensitive to different semantic and affective features of films. Our analysis, therefore, shows that at a broad level, mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, researchers in the future could derive mechanistic insights into how the semantic features may influence the mDES data. For example, future studies could ask participants to watch movies in a scrambled order to understand how varying the structure of semantics or information breaks the mapping between brains and ongoing experience. In this revision we have amended the text to reflect this possibility:

      Page 34 [674-679]. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES.

      ●      P14: "Brain - Thought Mappings: Voxel-space Analysis" - this is a cool analysis, and a nice validation of the authors' approach. I would personally love to see some form of reliability analysis on these approaches - e.g., do the same locations in the cerebral cortex align with the four features in all three movies? Across subjects?

      This is another great point, and we thank you for your enthusiasm. The data we have has only sampled mDES during a relatively short period of brain activity which we suspect would make an individual-by-individual analysis underpowered. In the future, however, it may be possible to adopt a precision mapping approach in which we sample mDES during longer periods of movie watching and identify how group-level mappings of experience relate to brain activity within a single subject. To reflect this possibility, we have amended the text in this revision in the following way:

      Page 34-35 [672-687]: In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants' experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future, it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      Reviewer #2:

      (1) The three-dimensional scatter plot in Figure 2 does not represent "Intrusive Distraction." Would it make sense to color-code dots by this important dimension?

      Thank you for this suggestion. Although it could be possible to indicate the location of each film in all four dimensions, we were worried that this would make the already complex 3-D space confusing to a naive reader. In this case, we prefer to provide this information in the form of bar graphs, as we did in the previous submission.

      (2) The coloring of neural activation patterns in Figure 3 is not distinct enough between the different dimensions of thought. Please reconsider color intensities or coding. The same applies to the left panel in Figure 4.

      Thanks for this comment; we found it quite difficult to find a colour mapping that allows us to show the distinction between four states in a simple manner, yet we believe it is valuable to show all of the results on a similar brain. Nonetheless, to provide a more fine-grained viewing of our results in this revision we have provided a supplementary figure (Supplementary Figure 6) that shows each of the observed patterns of activity in isolation.

      (3) The new method (mDES) is mentioned too often without explanation, making it hard to follow without referring to the methods section. It would be helpful to state prominently that participants rated their thoughts on different dimensions instead of verbalizing them.

      Thank you for this point, we have adjusted the Introduction to clarify and expand on the mDES method. We have also included an example of the mDES method in an additional figure that we have now included to visually express how participants respond to mDES probes (Figure 3).

      Page 6-7 [136-148]: In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [2]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [3-5] and in daily life [6, 7], and is sensitive to accompanying changes in brain activity when reports are gained during scanning [8, 9]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [3, 5-14]. Each question describes a different feature of experience, such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See Methods for a full list of questions used in the current study).

      Author response image 1.

      (4) Reporting of single-movie thought patterns seems quite extensive. Could this be condensed in the main text?

      Thank you for this point, upon re-visiting the manuscript, we have adjusted the text to be more concise.

      Reviewer #3:

      ●      This is a very elegant experiment and seems like a very promising approach. The text is currently hard to read.

      Thank you for this point, we have since revisited the text and adjusted the manuscript to be more concise and add more clarity.

      ●      The introduction (+ analysis goals) fails to explain the basic aspects of the analysis and dataset. It is not clear how many participants and datapoints were used to establish the group-level thought patterns, nor is it entirely clear that the fMRI data is a separate existing dataset. Some terms are introduced and highlighted and never revisited (e.g decoupled states and the role of the DMN).

      Thank you for this critique, we have since adjusted the introduction to clearly explain the difference between Sample 1 and Sample 2 and further clarify that the fMRI data is an entirely separate, independent sample compared to the laboratory mDES sample:

      Page 7-8 [158-174]: Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants. In the current study, one set of 120 participants was probed with mDES five times across the three ten-minute movie clips (11 minutes total, no sampling in the first minute). We used a jittered sampling technique where probes were delivered at different intervals across the film for different people depending on the condition they were assigned. Probe orders were also counterbalanced to minimize the systematic impact of prior and later probes at any given sampling moment. We used these data to construct a precise description of the dynamics of experience for every 15 seconds of three ten-minute movie clips. These data were then combined with fMRI data from a different sample of 44 participants who had already watched these clips without experience sampling [15]. By combining data from two different groups of participants, our method allows us to describe the time series of different experiential states (as defined by mDES) and relate these to the time series of brain activity in another set of participants who watched the same films with no interruptions. In this way, our study set out to explicitly understand how the patterns of thoughts that dominate different moments in a film in one group of participants relate to the brain activity at these time points in a second set of participants and, therefore, better understand the contribution of different neural systems to the movie-watching experience.

      Page 8-9 [177-188] The goal of our study, therefore, was to understand the association between patterns of brain activity over time during movie clips in one group of participants and the patterns of thought that participants reported at the corresponding moment in a different set of participants (see Figure 1). This can be conceptualized as identifying the mapping between two multi-dimensional spaces, one reflecting the time series of brain activity and the other describing the time series of ongoing experience (see Figure 1 right-hand panel). In our study, we selected three 11-minute clips from movies (Citizenfour, Little Miss Sunshine and 500 Days of Summer) for which recordings of brain data in fMRI already existed (n = 44) [15] (Figure 1, Sample 1). A second set of participants (n = 120) viewed the same movie clips, providing intermittent reports on their thought patterns using mDES (Figure 1, Sample 2). Our goal was to understand the mapping between the patterns of brain activity at each moment of the film and the reports of ongoing thought recorded at the same point in the movies.

      ●      It is unclear what the utility of the method is - is it meant to be done in fMRI studies on the same participants? Or is the idea to use one sample to model another?

      Great point, thank you for highlighting this important question. This paper aimed to interrogate the relationship between experience and neural states while preserving the novelty of movie-watching. Although it could be done in the same sample, it may be difficult to collect frequent reports of experience without interrupting the dynamics of the brain. However, in the future it could be possible to collect mDES and brain activity in the same individuals while they watched movies. For example, our prior studies (e.g. [9]) where we combined mDES with openly-available brain data activity during tasks. In the future, this online method could also be applied during movie watching to identify direct mapping between brain activity and films. However, this online approach would make it very expensive to produce the time series of experience across each clip given that it would require a large number of participants (e.g. 200 as we used in our current study). The following has been included in our manuscript:

      Page 7 [149-159] One challenge that arises when attempting to map the dynamics of thought onto brain activity during movie watching is accounting for the inherently disruptive nature of experience sampling: to measure experience with sufficient frequency to map the dynamics of thoughts during movies would disrupt the natural dynamics of the brain and would also alter the viewer’s experience (for example, by pausing the film at a moment of suspense). Therefore, if we periodically interrupt viewers to acquire a description of their thoughts while recording brain activity, this could impact capturing important dynamic features of the brain. On the other hand, if we measured fMRI activity continuously over movie-watching (as is usually the case), we would lack the capacity to directly relate brain signals to the corresponding experiential states. Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants

      ●      The conclusions currently read as somewhat trivial (e.g "Our study, therefore, establishes both sensory and association cortex as core features of the movie-watching experience", "Our study supports the hypothesis that perceptual coupling between the brain and external input is a core feature of how we make sense of events in movies").

      Thank you for this comment. In this revision we have attempted to extend the theoretical significance of our work in the discussion (for example, in contrasting the links between Intrusive distraction and the other components). To this end we have amended the text in this revision by including the following sections:

      Page 33-35 [654-687]: Importantly, our study provides a novel method for answering these questions and others regarding the brain basis of experiences during films that can be applied simply and cost-effectively. As we have shown mDES can be combined with existing brain activity allowing information about both brain activity and experience to be determined at a relatively low cost.  For example, the cost-effective nature of our paradigm makes it an ideal way to explore the relationship between cognition and neural activity during movie-watching during different genres of film. In neuroimaging, conclusions are often made using one film in naturalistic paradigm studies [16]. Although the current study only used three movie clips, restraining our ability to form strong conclusions regarding how different patterns of thought relate to specific genres of film, in the future, it will be possible to map cognition across a more extensive set of movies and discern whether there are specific types of experience that different genres of films engage. One of the major strengths of our approach, therefore, is the ability to map thoughts across groups of participants across a wide range of movies at a relatively low cost.

      Nonetheless, this paradigm is not without limitations. This is the first study, as far as we know, that attempts to compare experiential reports in one sample of participants with brain activity in a second set of participants, and while the utility of this method enables us to understand the relationship between thought and brain activity during movies, it will be important to extend our analysis to mDES data during movie watching while brain activity is recorded. In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      ●      The beginning of the discussion is very clear and explains the study very well. Some of it could be brought up in the intro/analysis goal sections.

      Thank you for this comment, this is an excellent idea. We have revisited the introduction and analysis goals section to mirror this clarity across the manuscript.

      ●      The different components are very interesting, and not entirely clear. Some examples in the text could help. Especially regarding your thought that verbal components would refer to a "decoupled" mental verbal analysis participants might be performing in their thoughts.

      Thank you for this point. We would prefer not to elaborate on this point since, at present, it would simply be conjecture based on our correlational design. However, we have included a section in the discussion which explains how, in principle, we would draw more mechanistic conclusions (for example, by shuffling the order of scenes in a movie as suggested by another reviewer). In the current revision, we have amended the text in the following way:

      Page 34 [674-679]: Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES

      ●      The reference to using neurosynth as performing a meta-analysis seems a little stretched.

      We have adjusted the manuscript to remove ‘meta-analysis’ when referring to the analysis computed with neurosynth. Thank you for bringing this to our attention.

      ●      State-space is defined as brain-space in the methods.

      Thank you, we have since updated this.

      ●      It could be useful to remind the reader what thought and brain spaces are at the top of the state-space results section.

      This is an excellent point, and it has since been updated to remind the reader of thought- and brain-space. Thank you for this comment.

      Page 24 [458-467]: Our next analysis used a “state-space” approach to determine how brain activity at each moment in the film predicted the patterns of thoughts reported at these moments (for prior examples in the domain of tasks, see [12, 17], See Methods). In this analysis, we used the coordinates of the group average of each TR in the “brain-space” and the coordinates of each experience sampling moment in the “thought-space.”. To clarify, the location of a moment in a film in “brain-space” is calculated by projecting the grand mean of brain activity for each volume of each film against the first five dimensions of brain activity from a decomposition of the Human Connectome Project (HCP) resting state data, referred to as Gradients 1-5. “Thought-space” is the decomposition of mDES items to create thought pattern components, referred to as “Episodic Knowledge”, “Intrusive Distraction”, “Verbal Detail” and “Sensory Engagement.”

      ●      DF missing from the t-test for episodic knowledge/grad 4.

      Thank you for catching this, the degrees of freedom has since been included in this revision.

      Page 24 [474-476]: First, we found a significant main effect of Gradient 4 (DAN to Visual), which predicted the similarity of answers to the “Episodic Knowledge” component, t(2046) = 2.17, p = .013, η2 = .01.

      Public Reviews:

      Reviewer #1:

      ●      The lack of direct interrogation of individual differences/reliability of the mDES scores warrants some pause.

      Our study's goal was to understand how group-level patterns of thought in one group of participants relate to brain activity in a different group of participants. To this end, we decomposed trial-level mDES data to show dimensions that are common across individuals, which demonstrated excellent split-half reliability. Then we used these data in two complementary ways. First, we established that these ratings reliably distinguished between the different films (showing that our approach is sensitive to manipulations of semantic and affective features in a film) and that these group-level patterns were also able to predict patterns of brain activity in a different group of participants (suggesting that mDES dimensions are also sensitive to broad differences in how brain activity emerges during movie watching). Second, we established that variation across individuals in their mDES scores predicted their comprehension of information from the films. This establishes that when applied to movie-watching, mDES is sensitive to individual differences in the movie-watching experience (as determined by an individual's comprehension). Given the success of this study and the relative ease with which mDES can be performed, it will be possible in the future to conduct mDES studies that hone in on the common and distinct features of the movie-watching experience.

      Reviewer #2:

      (1) The dimensions of thought seem to distinguish between sensory and executive processing states. However, it is unclear if this effect primarily pertains to thinking. I could imagine highly intrusive distractions in movie segments to correlate with stagnating plot development, little change in scenery, or incomprehensible events. Put differently, it may primarily be the properties of the movies that evoke different processing modes, but these properties are not accounted for. For example, I'm wondering whether a simple measure of engagement with stimulus materials could explain the effects just as much. How can the effects of thinking be distinguished from the perceptual and semantic properties of the movie, as well as attentional effects? Is the measure used here capturing thought processes beyond what other factors could explain?

      Our study used mDES to identify four distinct components of experience, each of which had distinct behavioural and neural correlates and relationships to comprehension. Together this makes it unlikely that a single measure of engagement would be able to capture the range of effects we observed in our study. For example, “Intrusive Distraction” was associated with regions of association cortex, while the other three components highlighted regions of sensory cortex. Behaviorally, we found that some components had a common effect on comprehension (e.g. “Intrusive distraction” was related to worse comprehension across all films), while others were linked to clear benefits to comprehension in specific films (e.g. “Episodic Knowledge” was associated with better comprehension in only one of the films). Given the complex nature of these effects, it would be difficult for a single metric of engagement to explain this pattern of results, and even if it did, this could be misleading because our analysis implies that they are better explained by a model of movie-watching experience in which there are several relatively orthogonal dimensions upon which our experience can vary.

      At the same time, we also found that films vary in the general types of experience they can engender. For example, Citizenfour was high on “Intrusive Distraction” and participants performed relatively low on comprehension. This shows that manipulations of the semantic and affective content of films also have implications for the movie-watching experience. This pattern is consistent with laboratory studies that applied mDES during tasks and found that different tasks evoke different types of experience (for example, patterns of ‘intrusive’ thoughts were common in movie clips that were suspenseful, [18]). At the same time, in the same study, patterns of intrusive thought across the tasks were also associated with trait levels of dysphoria reported by participants. Other studies using mDES in daily life have shown that the data can be described by multiple dimensions and that each of these types of thought is more prevalent in certain activities than others ([19]). For example, in daily life, patterns of ‘intrusive distraction’ thoughts were more prevalent when individuals were engaged in activities that were relatively unengaging (such as resting). Collectively, therefore, studies using mDES suggest that is likely that human thought is multidimensional in nature and that these dimensions vary in a complex way in terms of (a) the contexts that promote them, and (b) how they are impacted by features of the individual (whether they be traits like anxiety or depression or memory for information in a film).

      (2) I'm skeptical about taking human thought ratings at face value. Intrusive distraction might imply disengagement from stimulus materials, but it could also be an intended effect of the movie to trigger higher-level, abstract thinking. Can a label like intrusive distraction be misleading without considering the actual thought and movie content?

      Our method uses a data-driven approach to identify the dimensions that best describe the range of answers that our participants provided to describe their experience. We use these dimensions to understand how these patterns of thought emerge in different contexts and how they vary across individuals (in this case, in different movies, but in other studies, laboratory tasks [3, 8, 9, 12, 20-22] or activities in daily life[6, 7]). These context relationships help constrain interpretations of what the components mean. For example, “Intrusive Distraction” scores were highest in the film with the most real-world significance for the participants (Citizenfour) and were associated with worse comprehension. In daily life, however, patterns of “Intrusive Distraction” thoughts tend to occur when activities engage in non-demanding activities, like resting. Psychological perspectives on thoughts that arise spontaneously occur in this manner since there is evidence that they occur in non-demanding tasks with no semantic content (when there is almost no external stimulus to explain the occurrence of the experience, see [23]), however, other studies have shown that specific cues in the environment can also cue the experience (see [23]). Consistent with this perspective, and our current data, patterns of ‘Intrusive Distraction’ thought are likely to arise for multiple reasons, some of which are more intrinsic in nature (the general association with poor comprehension across all films) and others which are extrinsic in nature (the elevation of intrusive distraction in Citizenfour).

      It is also important to note that our data-driven approach also found patterns of experience that provide more information about the content of their experience, for example, the dimension of “Episodic Knowledge” is characterized by thoughts based on prior knowledge, involving the past, and concerning oneself, and was most prevalent in the romance film (500 Days of Summer). Likewise, “Sensory Engagement” was associated with experiences related to sensory input and positive emotionality and occurred more during the romance movie (500 Days of Summer) than in the documentary (Citizenfour) and was linked to increased brain activity across the sensory systems. This shows that mDES can also provide information about the content of that experience, and discriminate between different sources of experience. In the future, it will be possible to improve the level of detail regarding the content of experiences by changing the questions used to interrogate experience.     

      (3) A jittered sampling approach is used to acquire thought ratings every 15 seconds. Are ratings for the same time point averaged across participants? If so, how consistent are ratings among participants? High consistency would suggest thoughts are mainly stimulus-evoked. Low consistency would question the validity of applying ratings from one (group of) participant(s) to brain-related analyses of another participant.

      In this experiment, we sampled experience every 15 seconds in each clip, and in each sampling epoch, we gained mDES responses from eight participants. Furthermore, no participant was sampled at an adjacent time point, as our approach jittered probes approximately 2 minutes apart (See Supplementary Figure 7). To illustrate the consistency of mDES data, we have included an additional figure (Figure 3) highlighting how experience varies over time in each clip. It is evident from these plots that there are distinct moments in which group-averaged reported thoughts across participants are stable and that these can extend across adjacent sampling points (i.e. when the confidence intervals of the score at a timepoint do not overlap with zero). Therefore, in some cases, adjacent sampling points, consisting of different sets of eight participants, describe their experiences as having similar positions on the same mDES dimension. This suggests that there is agreement among individuals regarding how they experienced a specific moment in a film, and in some cases, this agreement was apparent in successive sets of eight participants. Together, our findings indicate a conservation of agreement across participants that spans multiple moments in a film. A clear example of agreement on experience across multiple sets of 10 participants can be seen between 150-400 seconds in the clip from 500 Days of Summer for the dimension of “Sensory Engagement” (time series plot 4 in Figure 3).

      (4) Using three different movies to conclude that different genres evoke different thought patterns (e.g., line 277) seems like an overinterpretation with only one instance per genre.

      We found that mDES was able to distinguish between each film on at least one dimension of experience. In other words, information encoded in the mDES dimensions was sensitive to variation in semantic and affective experiences in the different movie clips. This provides evidence that is necessary but not sufficient to conclude that we can distinguish different genres of films (i.e. if we could not distinguish between films, then we would not be able to distinguish genres). However, it is correct that to begin answering the broader question about experiences in different genres then it would be necessary to map cognition across a larger set of movies, ideally with multiple examples of each genre.

      (5) I see no indication that results were cross-validated, and no effect sizes are reported, leaving the robustness and strength of effects unknown.

      Thank you for drawing this to our attention. We have re-run the LMMs and ANOVA models to include partial eta-squared values to clarify the strength of the effects in each of our reported outcomes.

      Reviewer #3:

      ●      What are the considerations for treating high-order thought patterns that occur during film viewing as stable enough to be used across participants? What would be the limitations of this method? (Do all people reading this paper think comparable thoughts reading through the sections?)

      It is likely, based on our study, that films can evoke both stereotyped thought patterns (i.e. thoughts that many people will share) and others that are individualistic. It is clear that, in principle, mDES is capable of capturing empirical information on both stereotypical thoughts and idiosyncratic thoughts. For example, clear differences in experiences across films and, in particular, during specific periods within a film, show that movie-watching can evoke broadly similar thought patterns in different groups of participants (see Figure 3 right-hand panel). On the other hand, the association between comprehension and the different mDES components indicate that certain individuals respond to the same film clip in different ways and that these differences are rooted in objective information (i.e. their memory of an event in a film clip). A clear example of these more idiosyncratic features of movie watching experience can be seen in the association between “Episodic Knowledge” and comprehension. We found that “Episodic Knowledge” was generally high in the romance clip from 500 Days of Summer but was especially high for individuals who performed the best, indicating they remembered the most information. Thus good comprehends responded to the 500 Days of Summer clip with responses that had more evidence of “Episodic Knowledge” In the future, since the mDES approach can account for both stereotyped and idiosyncratic features of experience, it will be an important tool in understanding the common and distinct features that movie watching experiences can have, especially given the cost effective manner with which these studies can be run.   

      ●      How does this approach differ from collaborative filtering, (for example as presented in Chang et al., 2021)?

      Our study is very similar to the notion of collaborative filtering since we can use an approach that is similar to crowd-sourcing as a tool for understanding brain activity. One of its strengths is its generalizability since it is also a method that can be used to understand cognition because it is not limited to movie-watching. We can use the same mDES method to sample cognition in multiple situations in daily life ([6, 19]), while performing tasks in the behavioural lab [18, 24], and while brain activity is being acquired [8, 25, 26]. In principle, therefore, we can use mDES to understand cognition in different contexts in a common analytic space (see [27] for an example of how this could work)

      Page 5 [106-110]: In our study, we acquired experiential data in one group of participants while watching a movie clip and used these data to understand brain activity recorded in a second set of participants who watched the same clip and for whom no experiential data was recorded. This approach is similar to what is known as “collaborative filtering” [28].

      ●      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly. It fails to discuss and establish the utility and appropriateness of its proposed method.

      Thank you very much for your feedback and critique. In our revision and our responses to these questions, we provided more information about the method's robustness utility and application to understanding cognition.

      References

      (1) Gordon, E.M., et al., Precision Functional Mapping of Individual Human Brains. Neuron, 2017. 95(4): p. 791-807.e7.

      (2) Smallwood, J., et al., The neural correlates of ongoing conscious thought. Iscience, 2021. 24(3).

      (3) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and Cognition, 2021. 93.

      (4) Smallwood, J., et al., The default mode network in cognition: a topographical perspective. Nature Reviews Neuroscience, 2021. 22(8): p. 503-513.

      (5) Turnbull, A., et al., Age-related changes in ongoing thought relate to external context and individual cognition. Consciousness and Cognition, 2021. 96: p. 103226.

      (6) McKeown, B., et al., The impact of social isolation and changes in work patterns on ongoing thought during the first COVID-19 lockdown in the United Kingdom. Proceedings of the National Academy of Sciences, 2021. 118(40): p. e2102565118.

      (7) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and Cognition, 2023. 114: p. 103530.

      (8) Konu, D., et al., A role for the ventromedial prefrontal cortex in self-generated episodic social cognition. NeuroImage, 2020. 218: p. 116977.

      (9) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature Communications, 2019. 10.

      (10) Ho, N.S.P., et al., Facing up to the wandering mind: Patterns of off-task laboratory thought are associated with stronger neural recruitment of right fusiform cortex while processing facial stimuli. NeuroImage, 2020. 214: p. 116765.

      (11) Karapanagiotidis, T., et al., Tracking thoughts: Exploring the neural architecture of mental time travel during mind-wandering. NeuroImage, 2017. 147: p. 272-281.

      (12) McKeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific Reports, 2023. 13(1): p. 21710.

      (13) Vatansever, D., et al., Distinct patterns of thought mediate the link between brain functional connectomes and well-being. Network Neuroscience, 2020. 4(3): p. 637-657.

      (14) Wang, H.-T., et al., Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind. Psychological Science, 2017. 29(1): p. 56-71.

      (15) Aliko, S., et al., A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Scientific Data, 2020. 7(1).

      (16) Yang, E., et al., The default network dominates neural responses to evolving movie stories. Nature Communications, 2023. 14(1): p. 4197.

      (17) Turnbull, A., et al., Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Scientific Reports, 2020. 10(1): p. 9912.

      (18) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and cognition, 2021. 93: p. 103139.

      (19) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and cognition, 2023. 114: p. 103530.

      (20) Christoff, K., et al., Experience sampling during fMRI reveals default network and executive system contributions to mind wandering. Proc Natl Acad Sci U S A, 2009. 106(21): p. 8719-24.

      (21) Zhang, M., et al., Perceptual coupling and decoupling of the default mode network during mind-wandering and reading. eLife, 2022. 11: p. e74011.

      (22) Zhang, M.C., et al., Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific Reports, 2019. 9.

      (23) Smallwood, J. and J.W. Schooler, The science of mind wandering: Empirically navigating the stream of consciousness. Annual review of psychology, 2015. 66(1): p. 487-518.

      (24) Turnbull, A., et al., The ebb and flow of attention: Between-subject variation in intrinsic connectivity and cognition associated with the dynamics of ongoing experience. Neuroimage, 2019. 185: p. 286-299.

      (25) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature communications, 2019. 10(1): p. 3816.

      (26) Mckeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific reports, 2023. 13(1): p. 21710.

      (27) Chitiz, L., et al., Mapping cognition across lab and daily life using experience-sampling. 2023.

      (28) Chang, L.J., et al., Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Science Advances, 2021. 7(17): p. eabf7129.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

      Regarding significance:

      This study provides in vivo evidence that caspase-3 is important for synapse elimination in the visual pathway (Figure 3 and 4) and corroborates the previously proposed but not yet validated “synaptosis” hypothesis. But more significantly, we show that caspase-3 is activated in dLGN relay neurons in response to synapse inactivation (Figure 1) when synaptic competition is present (Figure 2), and that caspase-3 is important for efficient elimination of weakened synapses by microglia (Figure 5 and 6). We consider the causal link between synapse weakening/inactivation and caspase-3 activation to be the most important finding of this study and believe it is an error to not include this aspect of the study in the assessment. The mechanism by which neuronal activity influences synapse elimination is a fundamental question in neuroscience, and our study presents a significant advancement in understanding this problem.

      Regarding strength of evidence:

      We do not agree with the assessment that our evidence should be broadly labeled as “incomplete”. In fact, we argue that many concerns raised by the reviewers are not focused on the main claims made in this study.

      (1) Regarding whether caspase-3 activation (not “expression”, which is the term used in the assessment) is isolated to synapses or occurs in entire cells, we show in Figure 1 that both types of signals can be present. The main concern of the reviewers seems to be that activated caspase-3 signals in apoptotic dLGN relay neurons are irrelevant to our analysis and confound interpretation. We argue that this is not the case.

      In Figure 1, we have two sets of controls demonstrating that the observed apoptosis of dLGN relay neurons occurs specifically in response to synapse inactivation. For each animal that received TeTxLC injection in the right eye, activated caspase-3 signal is compared between the left dLGN, where most of the inactivated synapses are located, and the right dLGN, where the minority of the inactivated synapses are located (between Figure 1B and 1C, also between the first and second group of Figure 1E). We observed apoptotic neurons in the right dLGN with more inactivated synapses but not in the left dLGN with fewer inactivated synapses. The second control is between TeTxLC-injected animals (Figure 1B) and mock-injected animals (Figure 1D). We observed apoptotic relay neurons in the dLGN of TeTxLC-injected animals (Figure 1B) but not mock-injected animals (Figure 1D). Both these controls show that the observed apoptosis of dLGN relay neurons is caused by synapse inactivation.

      In addition, in our synapse inactivation experiment (Figure 1), AAV-hSyn-TeTxLC is injected into the right eye and expressed only in RGCs, not in dLGN relay neurons. Since dLGN relay neurons in this experiment do not receive a perturbation that is independent of synaptic transmission, we conclude that their apoptosis occurs through synapse-dependent mechanisms.

      Furthermore, if the apoptotic neurons are confounding the analysis (as implied by reviewers and editors) and do not occur through synapse-dependent mechanisms, then inhibiting both eyes with TeTxLC (Figure 2C, rightmost group) should cause high levels of caspase-3 activation, like that in the single-inhibition condition. Instead, we observe the opposite (Figure 2C, middle group) – overall caspase-3 activity goes down significantly in the dual-inhibition condition and is closer to the unperturbed condition, which can be explained by a loss of interaction between “strong” and “weak” synapses. Taken together, our data demonstrate that apoptosis of relay neurons in Figure 1 occurs specifically in response to synapse inactivation through synapse-dependent mechanisms, and the activated caspase-3 signal in the neurons should be included in our analysis.

      Why does synaptic caspase-3 activation manifest in different forms: puncta, “blobs”, and cells?  This is not surprising when considering the mechanisms that neurons must utilize to spatially confine caspase-3 activation and the nature of the apoptotic signaling cascade. On one hand, it has been proposed that caspase-3 activity in dendrites can be locally confined by proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ). On the other hand, caspase-3 activation is known to trigger explosive feedback amplification of apoptotic signaling events (McComb et al., DOI: 10.1126/sciadv.aau9433 ). For caspase-3 activation to remain localized to dendrites, the negative regulation must outweigh the positive feedback amplification. By expressing TeTxLC in RGCs of one eye, we create a strong perturbation that silences a large fraction of the synapses in the retinogeniculate pathway, which likely shifts the balance between positive and negative regulation of caspase-3 activity in some relay neurons. To be more specific, if a given dLGN relay neuron receives too many inactivated synapses, which is likely the case in our perturbation, caspase-3 activity that is initially localized can overwhelm the physiological negative regulation mechanisms that act to spatially confine it, resulting in whole cell apoptosis. In fact, previous in vitro evidence (Enturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ) demonstrated that, while caspase-3 activation in a single distal dendrite can be locally contained, activating apoptosis signaling in dendrites proximal to the cell body can result in whole-cell apoptosis. Similarly, a few inactivated retinogeniculate synapses can elicit locally contained caspase-3 activity in dLGN relay neurons, but a large number of inactivated synapses on a single relay neuron may trigger sufficient caspase-3 activity that can lead to whole-cell apoptosis. We discussed how to interpret synapse inactivation-induced apoptosis in dLGN relay neurons both in the main text and in the discussion (line 123-132, and line 411-421).

      (2) Regarding microglia, we did not claim that “microglia direct or instruct synapse elimination”. Our main claim is that caspase-3 activation is important for efficient elimination of weakened synapses by microglia. This claim emphasizes a regulatory role for caspase-3 activation in microglia-mediated synapse elimination, but not a regulatory role of microglia in synapse elimination. To be more specific, our data suggest that lack of synaptic activity induces caspase-3 activity, and caspase-3 activity in turn influences which synapses are preferentially eliminated by microglia. Therefore, the elimination specificity is fundamentally determined (i.e. instructed) by neuronal activity, not by microglia. We also did not presume the manner in which microglia engage in synapse elimination. We specifically address this point in the discussion at line 458 through 465 where we acknowledge that microglia may indirectly mediate synapse elimination by engulfing shed neuronal material. In our title and text, we use the phrase “microglia-mediated synapse elimination”, which is not the same as microglia-instructed synapse elimination and does not presume any instructive/directive role of microglia.

      (3) Regarding whether astrocytes are involved, we did not challenge the notion that astrocytes play important roles in synapse elimination. Rather, our claim is that, unlike what we observed with microglia, the amount of synaptic material engulfed by astrocytes does not robustly depend on whether caspase-3 is present. We acknowledge that there might be a caspase-3 dependent phenotype that we were unable to detect (line 309-310), and that it is plausible that astrocytes mediate activity-dependent synapse elimination through other caspase-3-independent mechanisms. This claim is not central to our study, and we would like to qualify the statements in the manuscript. We will remove the phrase “but not astrocytes” in line 18 of the abstract.

      In summary, using a state-of-the-art method to inactivate retinogeniculate synapses, we discovered a causal link between synapse weakening/inactivation and caspase-3 activation. Coupled with well-established in vivo assays (e.g., segregation analysis, electrophysiology, and engulfment analysis) that are used in many landmark studies we cite, we provide solid evidence supporting our claim that “caspase-3 is essential for synapse elimination driven by both spontaneous and experience-dependent neural activity”, and that “synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia”.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4). 

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. This is not accurate. We show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. This is not accurate. The apoptotic neurons we observed are relay neurons located in the dLGN (confirmed by their morphology and positive staining of NeuN – Figure S4B-C), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that the active caspase-3 signals in apoptotic dLGN relay neurons are not a confounding factor but a bona fide response to synaptic silencing and therefore should be included in the quantification. We have two sets of controls (please also see the general response above), one is between the strongly inactivated dLGN and the weakly inactivated dLGN in each TeTxLC-injected animal, second is between dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGN receiving strong synapse inactivation has these apoptotic dLGN relay neurons, demonstrating that these cells occur as a consequence of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. As mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting a synapse-related mechanism must be responsible. Considering the above, apoptosis of relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation, and active caspase-3 signals in these neurons are true signals that should be included in the quantification.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination. 

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to preferentially eliminate weak synapses.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and this caspase-3 activity in turn determines the substrate preference of microglia-mediated synapse elimination. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Throughout the manuscript, we used the term “microglia-mediated synapse elimination”. This terminology does not assume a directive/instructive role of microglia in synapse elimination and only describes the observed engulfment of synaptic material by microglia. We also did not assume how microglia engage in synapse elimination. We acknowledge in the discussion (line 458 through 465) that microglia may mediate synapse elimination in an indirect, passive way by engulfing shed neuronal material. This topic is a matter of debate in the field (Eyo et al., DOI: 10.1126/science.adh7906 ).

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper. 

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases. 

      Strengths: 

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration. 

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes. 

      Weaknesses: 

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      The experiments presented in Figure S11 aim to determine whether astrocyte-mediated synapse elimination depends on caspase- 3 signaling.  We do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We did observe a small decrease in synaptic material engulfed by astrocytes when caspase-3 is deficient, and we acknowledged that there could be defects that we were not able to detect (line 309-310). The claim that caspase-3 does not regulate astrocyte-mediated synapse elimination is not a central claim of the manuscript and we will qualify our statements in the text. We will remove the phrase “but not astrocytes” in the abstract (line 18).

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN? 

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases microglia-mediated engulfment of presynaptic terminals of inactivated synapses (Figure 6). We did not measure microglia-mediated engulfment of synaptic material while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in astrocyte-mediated engulfment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      However, the study falls short of its claim that it reveals the molecular mechanisms of a seasonal polyphenism. While cuticle tanning is an important part of the pear psyllid polyphenism, it is not the equivalent of it. First, there are other traits that distinguish between the two morphs, such as ovarian diapause (Oldfield, 1970), and the role of bursicon signaling in regulating these aspects of polyphenism were not measured. Thus, the phenotype in pear psyllids, whereby knockdown bursicon reduces cuticle tanning seems to simply demonstrate the phenotypes of Drosophila mutants for bursicon receptor (Loveall and Deitcher, 2010, BMC Dev Biol) in another species (Fig. 2I, 4H). Second, the study fails to address the threshold nature of cuticular tanning in this species, although it is the threshold response (specifically, to temperature and photoperiod) that distinguishes this trait as a part of a polyphenism. Whereas miR-6012 was found to regulate bursicon expression, there no evidence is provided that this microRNA either responds to or initiates a threshold response to temperature. In principle, miR-6012 could regulate bursicon whether or not it is part of a polyphenism. Thus, the impact of this work would be significantly increased if it could distinguish between seasonal changes of the cuticle and a bona fide reflection of polyphenism.

      Thanks for your valuable suggestion. We concur with the review’s comment that cuticle tanning does not equate to the C. chinensis polyphenism. To better reflect the core focus of our research, we have revised the title to "Neuropeptide Bursicon and its receptor mediated the transition from summer-form to winter-form of Cacopsylla chinensis".

      In response to the reviewer's inquiry regarding the threshold nature of cuticular tanning in C. chinensis, we have included a detailed analysis of the phenotypic changes (including nymph phenotypes, cuticle pigment absorbance, and cuticle thickness) during the transition from summer-form to winter-form in C. chinensis at distinct time intervals (3, 6, 9, 12, 15 days) under different temperature conditions (10°C and 25°C). As shown in Figure S1, nymphs exhibit a light yellow and transparent coloration at 3, 6, and 9 days, while nymphs at 12 and 15 days display shades of yellow-green or blue-yellow under 25°C conditions. At 10°C conditions, the abdomen end turns black at 3, 6, and 9 days. By the 12 days, numerous light black stripes appear on the chest and abdomen of nymphs at 10°C. At 15 days, nymphs exhibit an overall black-brown appearance, featuring dark brown stripes on the left and right sides of each chest and abdominal section. Furthermore, the end of the abdomen and back display a large black-brown coloration at 10°C (Figure S1A). The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Cuticle thicknesses also increased following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). The detailed results (L122-143), materials and methods (L647-652), and discussion (L319-322) have been added in our revised manuscript.

      Regarding the response of miR-6012 to temperature, we have already determined its expression at 3, 6, 10 days under different temperatures in the previous Figure 5E. We now included additional time intervals (9, 12, 15 days) in the updated Figure 5E. Our results indicate a significant decrease in the expression levels of miR-6012 after 10°C treatment for 3, 6, 9, 12, 15 days compared to the 25°C treatment group. Detailed information regarding this has been integrated into the Materials and Methods (Line 608-610) of our revised manuscript.

      Strengths:

      This study convincingly identifies homologs of the genes encoding the bursicon subunits and its receptor, showing an alignment with those of another psyllid as well as more distant species. It also demonstrates that the stage- and tissue-specific levels of bursicon follow the expected patterns, as informed by other insect models, thus validating the identity of these genes in this species. They provide strong evidence that the expression of bursicon and its receptor depend on temperature, thereby showing that this trait is regulated through both parts of the signaling mechanism.

      Several parallel measurements of the phenotype were performed to show the effects of this hormone, its receptor, and an upstream regulator (miR-6012), on cuticle deposition and pigmentation (if not polyphenism per se, as claimed). Specifically, chitin staining and TEM of the cuticle qualitatively show difference between controls and knockdowns, and this is supported by some statistical tests of quantitative measurements (although see comments below). Thus, this study provides strong evidence that bursicon and its receptor play an important role in cuticle deposition and pigmentation in this psyllid.

      The study identified four miRNAs which might affect bursicon due to sequence motifs. By manipulating levels of synthetic miRNA agonists, the study successfully identified one of them (miR-6012) to cause a cuticle phenotype. Moreover, this miRNA was localized (by FISH) to the cuticle, body-wide. To our knowledge, this is the first demonstrated function for this miRNA, and this study provides a good example of using a gene of known function as an entry point to discovering others influencing a trait. Thus, this finding reveals another level of regulation of cuticle formation in insects.

      Weaknesses:

      (1) The introduction to this manuscript does not accurately reflect progress in the field of mechanisms underlying polyphenism (e.g., line 60). There are several models for polyphenism that have been used to uncover molecular mechanisms in at least some detail, and this includes seasonal polyphenisms in Hemiptera. Therefore, the justification for this study cannot be predicated on a lack of knowledge, nor is the present study original or unique in this line of research (e.g., as reviewed by Zhang et al. 2019; DOI: 10.1146/annurev-ento-011118-112448). The authors are apparently aware of this, because they even provide other examples (lines 104-108); thus the introduction seems misleading as framed.

      Thanks for your excellent suggestion. We have added the paper of Zhang et al. 2019 which recommended by reviewer (DOI: 10.1146/annurev-ento-011118-112448) in Line 57 of our revised manuscript. The statement has been revised to “However, the specific molecular mechanism underling temperature-dependent polyphenism still require further clarification” in Line 60-61 of our revised manuscript.

      (2) The data in Figure 2H show "percent of transition." However, the images in 2I show insects with tanned cuticle (control) vs. those without (knockdown). Yet, based on the description of the Methods provided, there appears to be no distinction between "percent of transition" and "percent with tanning defects". This an important distinction to make if the authors are going to interpret cuticle defects as a defect in the polyphenism. Furthermore, there is no mention of intermediate phenotypes. The data in 2H are binned as either present or absent, and these are the phenotypes shown in 2I. Was the phenotype really an all-or-nothing response? Instead of binning, which masks any quantitative differences in the tanning phenotypes, the authors should objectively quantify the degree of tanning and plot that. This would show if and to what degree intermediate tanning phenotypes occurred, which would test how bursicon affects the threshold response. This comment also applies to the data in Figures 4G and 6G. Since cuticle tanning is present in more insect than just those with seasonal polyphenism, showing how this responds as a threshold is needed to make claims about polyphenism.

      We appreciate your insightful comments. As shown in Figure 1 of our published paper (Zhang et al., 2013; doi.org/10.7554/eLife.88744.3) and Figure 2C-2I of the current manuscript, the transition from summer-form to winter-form entails not only external cuticular tanning but also alterations in internal cuticular chitin levels and cuticle thickness. While external cuticular tanning serves as a prominent and easily observable indicator of this transition, it is crucial to acknowledge that internal changes also play a significant role and should be taken into consideration. Therefore, we propose that the term "percent of transition" may be more suitable than "percent with tanning defects" to describe this process accurately.

      In order to provide a more visually comprehensive understanding of the phenotypic changes during the transition from summer-form to winter-form, we have included images at different time points (3, 6, 9, 12, 15 days) under different temperature conditions in Figure S1A of our revised manuscript. Specifically, under the 10°C condition, nymphs exhibit abdomen tanning after 6 and 9 days of treatment, while the thorax remains untanned. By days 12 to 15, both the abdomen and thorax of the nymphs show tanning, resulting in the majority of summer-form nymphs transitioning into winter-form, as depicted in Figure 2I for comparison. This observation indicates the presence of a critical threshold for cuticle tanning of C. chinensis following exposure to 10°C. Nymphs that did not undergo the transition to winter-form succumbed to the cold, highlighting the absence of intermediate phenotypes at 12-15 days under the 10°C condition. The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Additionally, cuticle thickness shows an increase following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). These results highlight the relationship between the threshold of cuticular tanning and the transition process. The detailed description and information have been added in Results (L122-143), Materials and Methods (L647-652), and Discussion (L319-322) of our manuscript.

      (3) This study also does not test the threshold response of cuticle phenotypes to levels of bursicon, its receptor, or miR-6012. Hormone thresholds are the most widespread and, in most systems where polyphenism has been studied, the defining characteristic of a polyphenism (e.g., Nijhout, 2003, Evol Dev). Quantitative (not binned) measurements of a polyphenism marker (e.g., chitin) should be demonstrated to result as a threshold titer (or in the case of the receptor, expression level) to distinguish defects in polyphenism from those of its component trait.

      Thanks for your valuable feedback. We have supplemented additional data on the phenotypes (Figure S1A), cuticle pigment absorbance (Figure S1B), cuticle thickness (Figure S1C), expression levels of bursicon (Figure 1E and 1F), its receptors (Figure 3G), and miR-6012 (Figure 5E) corresponding to nymphs treated over different time periods (3, 6, 9, 12, 15 days) under both 10°C and 25°C conditions in our revised manuscript.

      While all these identified markers exhibit a strong correlation with the transition from summer-form to winter-form, it is important to note that they are not suitable as definitive thresholds due to the nature of relative gene expression quantification and chitin content assessment, rather than absolute quantitation. Further, given that tanning hormones are neuropeptides present in trace amounts in insects, unlike steroid hormones, determining their titers poses a considerable challenge.

      (4) Cuticle issue:

      (a) Unlike Fig. 6D and F, Figs. 2D and F do not correspond to each other. Especially the lack and reduction of chitin in ds-a+b! By fluorescence microscopy there is hardly any signal, whereas by TEM there is a decent cuticle. Additionally, the dsGFP control cuticle in 2D is cut obliquely with a thick and a thin chitin layer. This is misleading.

      Thanks for your insightful feedback. We have replaced the previous WGA chitin staining images in the dsCcbursα+β treatment of Figure 2D with new representative images aligning with Figure 2F. Furthermore, the presence of both thin and thick chitin layers observed in the dsEGFP treatment of Figure 2D could potentially be ascribed to the chitin content in the insect midgut or fat body as previously discussed (Zhu et al., 2016). It is notable that during the process of cuticle staining, the chitin located in the midgut and fat body of C. chinensis may exhibit green fluorescence, leading to the appearance of a thin chitin layer. A detailed analysis and elucidation of these observations have been added in the discussion section (Lines 347-352) of our revised manuscript.

      Zhu KY, Merzendorfer H, Zhang W, Zhang J, Muthukrishnan S. Biosynthesis, Turnover, and Functions of Chitin in Insects. Annu Rev Entomol. 2016;61:177-196. doi:10.1146/annurev-ento-010715-023933.

      (b) In Figs. 2F and 4F, the endocuticle appears to be missing, a portion of the procuticle that is produced post-molting. As tanning is also occurring post-molting, there seems to be a general problem with cuticle differentiation at this time point. This may be a timing issue. Please clarify.

      Thank you for your suggestion. The insect cuticle typically comprises three distinct layers (endocuticle, exocuticle, and epicuticle), with the thickness of each layer varying among different insect species. Cuticle differentiation is closely linked to the molting cycle of insects (Mrak et al., 2017). In our study, nymphal cuticles exhibited normal differentiation patterns, characterized by a thin epicuticle and comparable widths of the endocuticle and exocuticle following dsEGFP treatment, as illustrated in Figure 2F and 4F. Conversely, nymphs treated with dsCcBurs-α, dsCcBurs-β, and dsCcburs-R displayed impaired development, manifesting only the exocuticle without a discernible endocuticle layer. These findings suggest that bursicon genes and their receptor play a pivotal role in regulating insect cuticle development (Costa et al., 2016). We have added some discussion about these results in Lines 356-367 of our revised manuscript.

      Mrak, P., Bogataj, U., Štrus, J., & Žnidaršič, N. (2017). Cuticle morphogenesis in crustacean embryonic and postembryonic stages. Arthropod structure & development, 46(1), 77–95. https://doi.org/10.1016/j.asd.2016.11.001

      Costa, C. P., Elias-Neto, M., Falcon, T., Dallacqua, R. P., Martins, J. R., & Bitondi, M. (2016). RNAi-mediated functional analysis of Bursicon genes related to adult cuticle formation and tanning in the Honeybee, Apis mellifera. PloS one, 11(12), e0167421. https://doi.org/10.1371/journal.pone.0167421

      (c) To provide background information, it would be useful analyze cuticle formation in the summer and winter morphs of controls separately by light and electron microscopy. More baseline data on these two morphs is needed.

      Thanks for your valuable feedback. To provide more background information about cuticle formation, we supplied the results of nymph phenotypes, cuticle pigment absorbance, and cuticle thickness at distinct time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C in Figure S1 of our revised manuscript. Hope these results can help better understand the baseline data on these two morphs.

      (d) For the TEM study, it is not clear whether the same part of the insect's thorax is being sectioned each time, or if that matters. There is not an obvious difference in the number of cuticular layers, but only the relative widths of those layers, so it is difficult to know how comparable those images are. This raises two questions that the authors should clarify. First, is it possible that certain parts of the thoracic cuticle, such as those closer to the intersegmental membrane, are naturally thinner than other parts of the body? Second, is the tanning phenotype based on the thickness or on the number of chitin layers, or both? The data shown later in Figure 4I, J convincingly shows that the biosynthesis pathway for chitin is repressed, but any clarification of what this might mean for deposition of chitin would help to understand the phenotypes reported. Also, more details on how the data in Fig. 2G were collected would be helpful. This also goes for the data in Fig. 4 (bursicon receptor knockdowns).

      Thanks for your great comment. The TEM investigation adhered to a standardized protocol was used as previous description (Zhang et al., 2023), Initially, insect heads were uniformly excised and then fixed in 4% paraformaldehyde. Subsequently, a consistent cutting and staining procedure was executed at a uniform distance above the insect's thorax. The dorsal region of the thorax was specifically chosen for subsequent fluorescence imaging or transmission electron microscopy assessments with the specific objective of quantifying cuticle thickness. Regarding the measurement of cuticle thickness, use the built-in measuring ruler on the software to select the top and bottom of the same horizontal line on the cuticle. Measure the cuticle of each nymph at two close locations. Six nymphs were used for each sample. Randomly select 9 values and plot them. The related description has been added in the Materials and Methods (Line 660-668) of our revised manuscript.

      Zhang, S.D., Li, J.Y., Zhang, D.Y., Zhang, Z.X., Meng, S.L., Li, Z., & Liu, X.X. (2023). MiR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis. eLife, 12. https://doi.org/10.7554/eLife.88744

      (5) Tissue issue:

      The timed experiments shown in all figures were done in whole animals. However, we know from Drosophila that Bursicon activity is complex in different tissues. There is, thus, the possibility, that the effects detected on different days in whole animals are misleading because different tissues--especially the brain and the epidermis, may respond differentially to the challenge and mask each other's responses. The animal is small, so the extraction from single tissue may be difficult. However, this important issue needs to be addressed.

      Thanks for your excellent suggestion. We express our heartfelt appreciation to the reviewer for their valuable input regarding the challenges involved in dissecting various tissue sections from the diminutive early instar nymphs of C. chinensis. In light of the metamorphic transition of C. chinensis across developmental stages, this study concentrated on examining the extensive phenotypic alterations. Consequently, intact samples of C. chinensis were specifically chosen for for qPCR analysis. The related descriptions have been added in the Materials and Methods (Line 513, 517, 553, 555, and 613) and Discussion (Line 327-329) of our revised manuscript.

      (6) No specific information is provided regarding the procedure followed for the rescue experiments with burs-α and burs-β (How were they done? Which concentrations were applied? What were the effects?). These important details should appear in the Materials and Methods and the Results sections.

      Thanks for your excellent suggestion. For the rescue experiments, the dsRNA of CcBurs-R and proteins of burs α-α, burs β-β homodimers, or burs α-β heterodimer (200 ng/μL) were fed together. The concentration of heterodimer protein of CcBurs-α+β was 200 ng/μL. The heterodimer protein of CcBurs-α+β fully rescued the effect of RNAi-mediated knockdown on CcBurs-R expression, while α+α or β+β homodimers did not (Figure 3F). Feeding the α+β heterodimer protein fully rescued the defect in the transition percent and morphological phenotype after CcBurs-R knockdown (Figure 4G-4H). We have added the detailed methods of rescued experiments and specific concentrations in the Materials and Methods (Line 561-563), and Results (Line 263) of our revised manuscript.

      (7) Pigmentation

      (a) The protocol used to assess pigmentation needs to be validated. In particular, the following details are needed: Were all pigments extracted? Were pigments modified during extraction? Were the values measured consistent with values obtained, for instance, by light microscopy (which should be done)?

      Thanks for your excellent comment. Our protocol for pigment extracted as detailed in Bombyx mori, the cuticles were pulverized in liquid nitrogen and then dissolved in 30 milliliters of acidified methanol (Futahashi et al., 2012; Osanai-Futahashi et al., 2012). Thus, all cuticle pigments were dissected and treated with acidified methanol. Pigments were not modified during extraction.. The details description have been integrated into the Materials and Methods (Line 630-633) of our revised manuscript.

      Futahashi, R., Kurita, R., Mano, H., & Fukatsu, T. (2012). Redox alters yellow dragonflies into red. Proceedings of the National Academy of Sciences of the United States of America, 109(31), 12626–12631. https://doi.org/10.1073/pnas.1207114109

      Osanai-Futahashi, M., Tatematsu, K. I., Yamamoto, K., Narukawa, J., Uchino, K., Kayukawa, T., Shinoda, T., Banno, Y., Tamura, T., & Sezutsu, H. (2012). Identification of the Bombyx red egg gene reveals involvement of a novel transporter family gene in late steps of the insect ommochrome biosynthesis pathway. The Journal of biological chemistry, 287(21), 17706–17714. https://doi.org/10.1074/jbc.M111.321331

      (b) In addition, pigmentation occurs post-molting; thus, the results could reflect indirect actions of bursicon signaling on pigmentation. The levels of expression of downstream pigmentation genes (ebony, lactase, etc) should be measured and compared in molting summer vs. winter morphs.

      Thanks for your valuable suggestion. Actually, we already studied the function of some downstream pigmentation genes, including ebony, Lactase, Tyrosine hydroxylase, Dopa decarboxylase, and Acetyltransferase. The variations in the expression patterns of these genes are closely tied to the molting dynamics of nymphs undergoing transitions between summer-form and winter-form. These findings will put in another manuscript currently being prepared for submission, thus detailed outcomes are not suitable for inclusion in the current manuscript.

      (8) L236: "while the heterodimer protein of CcBurs α+β could fully rescue the effect of CcBurs-R knockdown on the transition percent (Figure 4G 4H)". This result seems contradictory. If CcBurs-R is the receptor of bursicon, the heterodimer protein of CcBurs α+β should not be able to rescue the effect of CcBurs-R knockdown insects. How can a neuropeptide protein rescue the effect when its receptor is not there! If these results are valid, then the CcBurs-R would not be the (sole) receptor for CcBurs α+β heterodimer. This is a critical issue for this manuscript and needs to be addressed (also in L337 in Discussion).

      Thanks for your insightful suggestion. Following the administration of dsCcBur-R to C. chinensis, the expression of CcBurs-R exhibited a reduction of approximately 66-82% as depicted in Figure 4A, rather than complete suppression. Activation of endogenous CcBurs-R through feeding of the α+β heterodimer protein results in an increase in CcBurs-R expression, with the effectiveness of the rescue effect contingent upon the dosage of the α+β heterodimer protein. Consequently, the capacity of the α+β heterodimer protein to effectively mitigate the impacts of CcBurs-R knockdown on the conversion rate is clearly demonstrated. We have added additional discussion in Line 396-403 of our revised manuscript.

      (9) Fig. 5D needs improvement (the magnification is poor) and further explanation and discussion. mi6012 and CcBurs-R seem to be expressed in complementary tissues--do we see internal tissues also (see problem under point 2)? Again, the magnification is not high enough to understand and appreciate the relationships discussed.

      Thanks for your valuable suggestion. In order to enhance the resolution of the magnified images, we conducted FISH co-localization of miR-6012 and CcBurs-R in 3rd instar nymphs and obtained detailed zoomed-in images. As shown in the magnified view of Figure 5D, miR-6012 and CcBurs-R appear to exhibit complementary expression patterns in tissues. During the FISH assays, epidermis transparency of C. chinensis was achieved via decolorization treatment. Noteworthy observations from Figure 3G and Figure 5E reveal an inverse correlation in the expression profiles of CcBurs-R and miR-6012. Consequently, the FISH results distinctly highlight a significant disparity in the expression levels of CcBurs-R and miR-6012 within the same tissue. We have added related explanation and discussion in Line 291-293 of our revised manuscript.

      (10) The schematic in Fig. 7 is a useful summary, but there is a part of the logic that is unsupported by the data, specifically in terms of environmental influence on cuticle formation (i.e., plasticity). What is the evidence that lower temperatures influence expression of miR-6012? The study measures its expression over life stages, whether with an agonist or not, over a single temperature. Measuring levels of expression under summer form-inducing temperature is necessary to test the dependence of miR-6012 expression on temperature. Otherwise, this result cannot be interpreted as polyphenism control, but rather the control of a specific trait.

      Thanks for your great suggestion. We actually conducted the assessment of miR-6012 expression at specific time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C. As depicted in Figure 5E, the expression levels of miR-6012 were notably reduced at 10°C compared to 25°C. Additionally, the evaluation of agomir-6012 expression level of C. chinensis under 25°C conditions at various time points (3, 6, 9, 12, 15 days) revealed no significant changes. Hence, we suggest that the impact of miR-6012 on the seasonal morphological transition is influenced upon temperature.

      Recommendations for the authors:

      The authors report a novel role of Bursicon and its receptor in regulating the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment (10°C) activated the Bursicon signaling pathway during the transition from summer-form to winter-form, which influences cuticle pigment content, cuticle chitin content, and cuticle thickness. Moreover, the authors identified miR-6012 and show that it targets CcBurs-R, thereby modulating the function of Bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of multiple roles of neuropeptide bursicon action in arthropod biology. However, the m

      anuscript does have several major weaknesses, described under "Public review", which the authors need to address.

      Major issues:

      (1) L152-154 Fig S2E and S2F: Bursicon has been shown to be expressed in the CNS in a specific set of neurons. For example, In the larval CNS of Manduca sexta, bursicon expression is restricted to the subesophageal ganglion (SG), thoracic ganglia, and first abdominal ganglion. Pharate pupae and pharate adults show expression of this heterodimer in all ganglia. In Drosophila larvae, expression of a bursicon heterodimer is confined to abdominal ganglia. The additional neurons in the ventral nerve cord express only burs. In pharate adults, bursicon is produced by neurons in the SG and abdominal ganglia. I am wondering where bursicon subunits are expressed in the C. chinensis CNS? Since the authors have the antibodies, it would be useful to include immunocytochemical staining of bursicon alpha and beta in the CNS. The qPCR results from head or other tissues (Fig S2E and S2F) is not the most informative way to document localization of gene expression. Regarding the qPCR results, they show that the cuticle and the fat body express CcBurs-α and CcBurs-β. Can the authors confirm this unexpected results independently?

      Thanks for your insightful comment. In this study, we did not directly used antibodies targeting bursicon subunits, instead, the bursicon subunits along with a histidine tag were integrated into the expression vector pcDNA3.1 using homologous recombination. The experimental procedures were executed as follows: initially, the histidine tag was fused to the pcDNA3.1-mCherry vector through homologous recombination to generate the recombinant plasmid pcDNA3.1-his-mCherry. Subsequently, the amino acid sequences of the two bursicon subunits were introduced into the pcDNA3.1-his-mCherry vector via homologous recombination to produce the recombinant plasmids pcDNA3.1-CcBurs-α-his-mCherry and pcDNA3.1-CcBurs-β-his-mCherry. Finally, the P2A sequence was incorporated into the vector using reverse PCR to yield the recombinant plasmids pcDNA3.1-CcBurs-α-his-P2A-mCherry and pcDNA3.1-CcBurs-β-his-P2A-mCherry. Consequently, the bursicon subunits, along with the histidine tag, were capable of generating fusion proteins with the histidine tag. Western blot analysis was conducted using antibodies targeting the histidine tag, enabling the detection of histidine expression, which corresponds to the expression of the bursicon subunits. However, they are not suitable to conduct the in vivo immunocytochemical staining of bursicon alpha and beta in the CNS.

      Due to the diminutive size of the C. chinensis nymphs, dissection of the central nervous system (CNS) was unfeasible, precluding specific assessment of bursicon expression in the CNS. Prior literature has documented the expression of bursicon subunits in the epidermis and fat body of C. chinensis. Studies suggest that bursicon subunits not only play a role in the melanization and sclerotization processes of insect epidermis but also have significant roles in insect immunity (An et al., 2012). The presence of bursicon subunits in the epidermis, gut, and fat body of C. chinensis may indicate their crucial roles in the immune functions of these tissues. Further investigation is required to elucidate the specific immune functions they perform, hinting at the potential expression of these bursicon subunits in these two tissues.

      An, S., Dong, S., Wang, Q., Li, S., Gilbert, L. I., Stanley, D., & Song, Q. (2012). Insect neuropeptide bursicon homodimers induce innate immune and stress genes during molting by activating the NF-κB transcription factor Relish. PloS one, 7(3), e34510. https://doi.org/10.1371/journal.pone.0034510

      (2) L222: "CcBurs-R is the Bursicon receptor of C. chinensis". Is this statement supported by affinity binding assay results?

      Thanks for your excellent suggestion. We employed a fluorescence-based assay to quantify calcium ion concentrations and investigate the binding affinities of bursicon heterodimers and homodimers to the bursicon receptor across varying concentrations. Our findings suggest that activation of the receptor by the burs α-β heterodimer leads to significant alterations in intracellular calcium ion levels, whereas stimulation with burs α-α and burs β-β homodimers, in conjunction with Adipokinetic hormone (AKH), maintains consistent intracellular calcium ion levels. Consequently, this research definitively identifies CcBurs-R as the bursicon receptor. For further details, please refer to the Materials and Methods (Lines 493-504), Results (Lines 231-239), and Discussion (Lines 377-384) of our revised manuscript.

      (3) L245 Figure 4I-4J: Since knockdown of bursicon and its receptor cause a decrease pigment accumulation in the cuticle, it would be useful to examine 1-2 rate limiting enzyme-encoding genes in the bursicon regulated cuticle darkening process if possible (as was done for genes involved in cuticle thickening).

      Thanks for your excellent comment. Following the further study, a thorough analysis was conducted to evaluate the impact of bursicon and its receptor on the expression levels of Lactase, Tyrosine hydroxylase, Dopa decarboxylase, Acetyltransferase, and the effects of RNA interference targeting these genes on the seasonal morphological transition. The findings underscored their role in the bursicon-mediated cuticle darkening process. However, as this section is slated for inclusion in an upcoming manuscript intended for submission, it is deemed unsuitable for incorporation into the current manuscript.

      Minor issues:

      (1) L75 "stronger resistance (Ge et al., 2019; Tougeron et al., 2021)". Stronger resistance to what? Stronger resistance to environmental stress or weather condition? Please clarify.

      Thanks for your excellent suggestion. We have changed the statement to “stronger resistance to weather condition” in Line 75 of our revised manuscript.

      (2) L132 Figure 1A and 1B: Bursicon sequence was first identified and functionally characterized in Drosophila melanogaster: is there any reason why Drosophila bursicon sequences were not included in the comparison?

      Thanks for your excellent comment. We have added the sequence of Burs-α and Burs-β of D. melanogaster in the sequence alignment results of Figure 1A and 1B of our revised manuscript.

      (3) Although the authors clearly identify and validate the function for the bursicon genes and its receptor's, there is no mention of whether duplicates of this gene are also present in the pear psyllid. This has been known to happen in otherwise conserved hormone pathways (e.g., insulin receptor in some insects), so a formal check of this should be done.

      Thanks for your excellent comment. As shown in Figure S2A-S2B and 3B, there are two bursicon subunit genes and only one bursicon receptor gene in our selected insect species, for examples Drosophila melanogaster, Diaphorina citri, Bemisia tabaci, Nilaparvata lugens, and Sogatella furcifera. In our transcriptome database of C. chinensis, we also only identified two bursicon subunit genes and only one bursicon receptor gene.

      (4) Line 41: Here, as in the title, "fascinating" is a subjective judgement that does not improve a study's presentation.

      Thanks for your great comment. We have changed "fascinating" to "transformation" in Line 41 and also revised the title of our revised manuscript.

      (5) Line 44: What makes some fields "cutting-edge" and others not?

      Thanks for your excellent suggestion. The expression of "in cutting-edge fields" has been deleted in Line 44 of our revised manuscript.

      (6) Line 97: This is a peculiar choice of reference for the concept of slower development in cold temperatures. The concept of degree-days and growth rates is old and widespread in entomology.

      Thanks for your insightful comment. The reference of Nyamaukondiwa et al., 2011 in Line 95 has been deleted in our revised manuscript.

      (7) Lines 149-150: What justifies the assumption that higher levels of expression mean a more important role? This gene might be just as necessary for development of the summer form, even if expressed at lower levels.

      Thanks for your excellent suggestion. This sentence has been revised to “Increased gene expression levels may potentially contribute to the transition from summer-form to winter-form in C. chinensis.” in Line 168-169 of our revised manuscript.

      (8) The blue arrow in Fig. 7 is confusing.

      Thanks for your excellent suggestion. In Figure 7, the blue arrow represents the down-regulated expression of miR-6012. We have added a description about the blue arrow in Figure 7 of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether additional HAA-related genes may also exhibit a signature of adaptive introgression. Specifically, the authors considered that if such a signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like in the EPAS1. Therefore, they leveraged two methods, including a composite likelihood method for detecting adaptive introgression and a biological networkbased method for detecting polygenic selection, and identified two additional genes that harbor plausible signatures of adaptive introgression for HAA.

      Strengths: 

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies.

      The authors reexamined previously published high-altitude Tibetan whole genome data and applied a couple of the recently developed methods for detecting adaptive introgression and polygenic selection. 

      Weaknesses: 

      My main concern with this paper is that I am not too convinced that the reported genomic regions putatively under polygenic selection are indeed of archaic origin. Other than some straightforward population structure characterizations, the authors mainly did two analyses with regard to the identification of adaptive introgression: First, they used one composite likelihood-based method, the VolcanoFinder, to detect the plausible archaic adaptive introgression and found two candidate genes (EP300 and NOS2). Next, they attempted to validate the identified signal using another method that detects polygenic selection based on biological network enrichments for archaic variants.

      In general, I don't see in the manuscript that the choice of methods here are well justified. VolcanoFinder is one among the several commonly used methods for detecting adaptive introgression (eg. the D, RD, U, and Q statistics, genomatnn, maldapt etc.). Even if the selection was mild and incomplete, some of these other methods should be able to recapitulate and validate the results, which are currently missing in this paper. Besides, some of the recent papers that studied the distribution of archaic ancestry in Tibetans don't seem to report archaic segments in the two gene regions. These all together made me not sure about the presence of archaic introgression, in contrast to just selection on ancestral variation.

      Furthermore, the authors tried to validate the results by using signet, a method that detects enrichments of alleles under selection in a set of biological networks related to the trait. However, the authors did not provide sufficient description on how they defined archaic alleles when scoring the genes in the network. In fact, reading from the method description, they seemed to only have considered alleles shared between Tibetans and Denisovans, but not necessarily exclusively shared between them. If the alleles used for scoring the networks in Signet are also found in other populations such as Han Chinese or Africans, then that would make a substantial difference in the result, leading to potential false positives.

      Overall, given the evidence provided by this article, I am not sure they are adequate to suggest archaic adaptive introgression. I recommend additional analyses for the authors to consider for rigorously testing their hypothesis. Please see the details in my review to the authors. 

      Reviewer #2 (Public Review):

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They also use a signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis and nitric oxide induction are enriched for archaic introgression.

      Strengths: 

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper, the authors look for more subtle signals in other genes (e.g EP300, NOS2) to investigate how archaic introgression may be enriched at the pathway level.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression.

      Weaknesses: 

      The manuscript lacks details or justification about how/why some of the analyses were performed. Below are some examples where the authors could provide additional details.

      The authors made specific choices in their window analysis. These choices are not justified or there is no comment as to how results might change if these choices were perturbed. For example, in the methods, the authors write "Then, the genome was divided into 200 kb windows with an overlap of 50 kb and for each of them we calculated the ratio between the number of significant SNVs and the total number of variants." 

      Additional information is needed for clarity. For example, "we considered only protein-protein interactions showing confidence scores {greater than or equal to} 0.7 and the obtained protein frameworks were integrated using information available in the literature regarding the functional role of the related genes and their possible involvement in high-altitude adaptation." What do the confidence scores mean? Why 0.7?

      In the method section (Identifying gene networks enriched for Denisovan-like derived alleles), the authors write "To validate VolcanoFinder results by using an independent approach". Does this mean that for signet the authors do not use the regions identified as adaptively introgressed using volcanofinder? I thought in the original signet paper, the authors used a summary describing the amount of introgression of a given region.

      Later, the authors write "To do so, we first compared the Tibetan and Denisovan genomes to assess which SNVs were present in both modern and archaic sequences. These loci were further compared with the ancestral reconstructed reference human genome sequence (1000 Genomes Project Consortium et al., 2015) to discard those presenting an ancestral state (i.e., that we have in common with several primate species)." It is not clear why the authors are citing the 1000 genomes project. Are they comparing with the reference human genome reference or with all populations in the 1000 genomes project? Also, are the authors allowing derived alleles that are shared with Africans? Typically, populations from Africa are used as controls since the Denisovan introgression occurred in Eurasia.

      The methods section for Figures 4B, 4C, and 4D is a little hard to understand. What is the x-axis on these plots? Is it the number of pairwise differences to Denisovan? The caption is not clear here. The authors mention that "Conversely, for non-introgressed loci (e.g., EGLN1), we might expect a remarkably different pattern of haplotypes distribution, with almost all haplotype classes presenting a larger proportion of non-Tibetan haplotypes rather than Tibetan ones." There is clearly structure in EGLN1. There is a group of non-Tibetan haplotypes that are closer to Denisovan and a group of Tibetan haplotypes that are distant from Denisovan...How do the authors interpret this? 

      In the original signet paper (Guoy and Excoffier 2017), they apply signet to data from Tibetans. Zhang et al. PNAS (2021) also applied it to Tibetans. It would be helpful to highlight how the approach here is different. 

      We thank the Reviewers for having appreciated the rationale of our study and to have identified potential issues that deserve to be addressed in order to better focus on robust results specifically supported by multiple approaches.

      First, we agree with the Reviewers that clarification and justification for the methodologies adopted in the present study should be deepened with respect to what done in the original version of the manuscript, with the purpose of making it more intelligible for a broad range of scientists. As reported thoroughly in the revised version of the text, the VolcanoFinder algorithm, which we used as the primary method to discover new candidate genomic regions affected by events of adaptive introgression, was chosen among several approaches developed to detect signatures ascribable to such an evolutionary process according to the following reasons: i) VolcanoFinder is one of the few methods that can test jointly events of both archaic introgression and adaptive evolution (e.g., the D statistic cannot formally test for the action of natural selection, having been also developed to provide genome wide estimates of allele sharing between archaic and modern groups rather than to identify specific genomic regions enriched for introgressed alleles); ii) the model tested by the VolcanoFinder algorithm remarkably differs from those considered by other methods typically used to test for adaptive introgression, such as the RD, U and Q statistics, which are aimed at identifying chromosomal segments showing low divergence with respect to a specific archaic sequence and/or enriched in alleles uniquely shared between the admixed group and the source population, as well as characterized by a frequency above a certain threshold in the population under study, thus being useful especially to test an evolutionary scenario conformed to that expected in the case that adaptation was mediated by strong selective sweeps rather than weak polygenic mechanisms (see answer to comment #1 of Reviewer #1 for further details); iii) VolcanoFinder relies on less demanding computational efforts respect to other algorithms, such as genomatnn and Maladapt, which also require to be trained on large genomic simulations built specifically to reflect the evolutionary history of the population under study, thus increasing the possibility to introduce bias in the obtained results if the information that guides simulation approaches is not accurate.

      Despite that, we agree with Reviewer #2 that some criteria formerly implemented during the filtering of VolcanoFinder results (e.g., normalization of LR scores, use of a sliding windows approach, and implementation of enrichment analysis based on specific confidence scores) might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods for details). 

      Moreover, to further reduce the use of potential arbitrary filtering thresholds we decided to do not implement functional enrichment analysis to prioritize results from the VolcanoFinder method. To this end, although a STRING confidence score (i.e., the approximate probability that a predicted interaction exists between two proteins belonging to the same functional pathway according to information stored in the KEGG database) above 0.7 is generally considered a high confidence score (string-db.org, Szklarczyk et al. 2014), we replaced such a prioritization criterion by considering as the most robust candidates for adaptive introgression only those genomic regions that turned out to be supported by all the approaches used (i.e., VolcanoFinder, Signet, LASSI and Haplostrips analyses).

      According to the Reviewers’ comments on the use of the Signet algorithm, we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier 2020 by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations but not in an outgroup population of African ancestry. Accordingly, we used the Signet method as an independent approach to obtain a first validation of introgressed (but not necessarily adaptive) loci pointed out by VolcanoFinder results. 

      In detail, in response to the question by Reviewer #2 about which genomic regions have been considered in the Signet analysis, it is necessary to clarify that to obtain the input score associated to each gene along the genome, as required by the algorithm, we calculated average frequency values per gene by considering all the archaic-derived alleles included in the Tibetan dataset but not in the outgroup one. Therefore, we did not take into account only those loci identified as significant by VolcanoFinder analysis, but we performed an independent genome scan. Then, we crosschecked significant results from VolcanoFinder and Signet approaches and we shortlisted the genomic regions supported by both. This approach thus differs from that of Zhang et al. 2021 in which the input scores per gene were obtained by considering only those loci previously pointed out by another method as putatively introgressed. Moreover, as mentioned in the previous paragraph, our approach differs also from that implemented by Guoy et al. 2017, in which the input scores assigned to each gene were represented by the variants showing the smallest P-value associated to a selection statistic, being thus informative about putative adaptive events but not introgression ones.

      However, as correctly pointed out by both the Reviewers, we formerly performed Signet analysis by considering derived alleles shared between Tibetans and the Denisovan species, without filtering out those alleles that are observed also in other modern human populations. We agree with the Reviewers that this approach cannot rule out the possibility of retaining false positive results ascribable to ancestral polymorphisms rather than introgressed alleles. According to the Reviewers’ suggestion, we thus repeated the Signet analysis by removing derived alleles observed also in an outgroup population of African ancestry (i.e., Yoruba), by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. In detail, we considered only those alleles that: i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles); ii) were assumed to be derived according to the comparison with the ancestral reconstructed reference human genome sequence; iii) were completely absent (i.e., present frequency equal to zero) in the Yoruba population sequenced by the 1000 Genomes Project. Despite the comment of Reviewer #1 seems to propose the possible use of Han Chinese as a further control population, we decided to do not filter out Denisovan-like derived alleles present also in this human group because evidence collected so far suggest that Denisovan introgression in the gene pool of East Asian ancestors predated the split between low-altitude and high-altitude populations (Lu et al. 2016; Hu et al. 2017) and, as mentioned before, we aimed at using the Signet algorithm to validate introgression events rather than adaptive ones (see the answer to comment #6 of Reviewer #1 for further details). Moreover, we would like to remark that we decided to maintain the Signet analysis as a validation method in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that goes beyond the simple identification of single putative introgressed alleles, by instead enabling us to point out those biological functions that might have been collectively shaped by gene flow from Denisovans.

      In addition to validate genomic regions putatively affected by archaic introgression by crosschecking results from the VolcanoFinder and Signet analyses, according to the suggestion by Reviewer #1 we implemented a further validation procedure aimed at formally testing for the adaptive evolution of the identified candidate introgressed loci. For this purpose, we applied the LASSI likelihood haplotype based method (Harris & DeGiorgio 2020) to Tibetan whole genome data. Notably, we choose this approach mainly for the following reasons: i) because it is able to detect and distinguish genomic regions that have experienced different types of selective events (i.e. strong and weak ones); ii) it has been demonstrated to have increased power in identifying them with respect to other selection statistics (e.g., H12 and nSL) (Harris & DeGiorgio 2020). Again, we performed an independent genome scan using the LASSI algorithm and then we crosschecked the obtained significant results with those previously supported by VolcanoFinder and Signet approaches in order to shortlist genomic regions that have plausibly experienced both archaic introgression and adaptive evolution.

      Moreover, we maintained a final validation step represented by Haplostrips analysis, which was instead specifically performed on chromosomal segments supported by results from both VolcanoFinder, Signet, and LASSI approaches. This enabled us to assess the similarity between Denisovan haplotypes and those observed in Tibetans (i.e., the population under study in which archaic alleles might have played an adaptive role in response to high-altitude selective pressures), Han Chinese (i.e., a sister group whose common ancestors with Tibetans have experienced Denisovan admixture, but have then evolved at low altitude), and Yoruba (i.e., an outgroup that is assumed to have not received gene flow from Denisovans). 

      In conclusion, we believe that the substantial changes incorporated in the manuscript according to the Reviewers’ suggestions strongly improved the study by enabling us to focus on more solid results with respect to those formerly presented. Interestingly, although the single candidate loci supported by all the approaches now implemented for validating the obtained results have attained higher prioritization with respect to previous ones (which are supported by some but not all the adopted methods), angiogenesis still stands out as the one of the main biological functions that have been shaped by events of adaptive introgression in human groups of Tibetan ancestry. This provides new evidence for the contribution of introgressed Denisovan alleles other than the EPAS1 ones in modulating the complex adaptive responses evolved by Himalayan populations to cope with selective pressures imposed by high altitudes.

      Responses to Recommendations For The Authors:

      Reviewer #1:

      The authors mainly relied on one method, VolcanoFinder (VF), to detect adaptive introgression signals. As one of the recently developed methods, VF indeed demonstrated statistical power at detecting mild selection on archaic variants, as well as detecting soft sweeps on standing variations. However, compared to other commonly used methods for detecting adaptive introgression, such as the U and Q stats (Racimo et al. 2017), genomatnn (Gower et al. 2021), or MaLAdapt (Zhang et al. 2023),

      VF doesn't seem to have better power at capturing mild and incomplete sweeps. And it makes me wonder about the justification for choosing VF over other methods here, which is not clearly explained in the manuscript. If these adaptive introgression candidates are legitimate, even if the signals are mild, at least some of the other methods should be able to recapitulate the signature (even if they don't necessarily make it through the genome-wide significance thresholds). I would be more convinced about the archaic origin of these regions if the authors could validate their reported findings using some of the aforementioned other methods. 

      According to the Reviewer’s suggestion, in the revised version of the manuscript we have expanded the considerations reported as concern the rationale that guided the choice of the adopted methods. In particular, in the Materials and methods section (see page 12) we have specificed the reasons for having used the VolcanoFinder algorithm. 

      First, it represents one of the few approaches that relies on a model able to test jointly the occurrence of archaic introgression and the adaptive evolution of the genomic regions affected by archaic gene flow, without the need for considering the putative source of introgression. This was a relevant aspect for us, beacuse we planned to adopt at least two main independent (and possibly quite different in terms of the underlying approaches) methods to validate the identified candidate intregressed loci and the other algorithm we used (i.e., Signet) was explicitly based on the comparison of modern data with the archaic sequence. Accordingly, the model tested by VolcanoFinder differs from those considered by the RD, U and Q statistics. In fact, RD statistic is aimed at identifying regions of the genome with low divergence with respect to a given archaic reference, while the U/Q statistics can detect those chromosomal segments enriched in alleles that are i) uniquely shared between the admixed group (e.g., Tibetans) and the source population (e.g., Denisovans), and ii) that present a frequency above a specific threshold in the admixed population (Racimo et al. 2016). For instance, all the loci considered as likely involved in adaptive introgression events by Racimo et al. 2016 presented remarkable frequencies, with most of them showing values above 50%. That being so, we decided to do not implement these methods because we believe that they are more suitable for the detection of adaptive introgression events involving few variants with a strong effect on the phenotype, which comport a substantial increase in frequency in the population subjected to the selective pressure (i.e., cases such as that of  EPAS1), while it appears challenging to choose an arbitrary frequency threshold appropriate for the detection of weak and/or polygenic selective events. 

      As regards the possible use of Maladapt or genomatnn approaches as validation methods, we believe that they rely on more demanding computational efforts with respect to the Signet algorithm and, above all, they have the disadvantage of requiring to be trained on simulated genomic data. This makes them more prone to the potential bias introduced in the obtained results by simulations that do not carefully reflect the evolutionary history of the population under study.

      Overall, we do not agree with the Reviwer’s statement about the fact that we mainly relied on a single method to detect adaptive introgression signals because, as mentioned above, the Signet algorithm was specifically used to identify genomic regions putatively affected by introgression. This method relies on assumptions very similar to those described above for the U/Q statistics (e.g. it considers alleles uniquely shared between Tibetans and Denisovans), but avoids the necessity to select a frequency threshold to shortlist the most likely adaptive intregressed loci. In addition, according to another suggestion by the Reviewer we have now implemented a further approach to provide evidence for the adaptive evolution of the candidate introgressed loci (see response to comment #3).  

      As regards the use of Signet, based on comments from both the Reviewers we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier (2020) by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations. That being so, we used the Signet method as an independent approach to obtain a first validation of VolcanoFinder results. However, by following suggestions from both the Reviweres, we modified the criteria adopted to filter for archaic-derived variants, by excluding those alleles in common between Denisovan and the Yoruba outgroup population (see response to comment #6 for further information regarding this aspect). 

      To sum up, we think that the combination of VolcanoFinder and Signet+LASSI approaches offered a good compromise between required computational efforts to shortlist the most robust candidates of adaptive introgressed loci and the typologies of model tested (i.e. that does not diascard a priori genomic signatures ascribable to weak and/or polygenic selective events). Morevoer, we would like to remark that we decided to maintain the Signet method as a validation approach in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that can be used to perform both single-locus validation analysis and to search for those biological functions that have been collectively much more impacted by archaic introgression, allowing to test a more realistic approximation of the polygenic model of adaptation involving introgressed alleles. In fact, although the single candidate loci supported by all the approaches now implemented for validating the obtained results  (see responses to comments #3 and #7 for further details) have attained higher prioritization with respect to previous ones (i.e., EP300 and NOS2, which are now supported by some but not all the adopted methods), angiogenesis still stands out as one of the main biological functions that have been shaped by events of adaptive introgression in the ancestors of Tibetan populations. 

      Besides, I am a little surprised to see that in Supplementary Figure 2, VF didn't seem to capture more significant LR values in the EPAS1 region (positive control of adaptive introgression) than in the negative control EGLN1 region. The author explained this as the selection on EPAS1 region is "not soft enough", which I find a bit confusing. If there is no major difference in significant values between the positive and negative controls, how would the authors be convinced the significant values they detected in their two genes are true positives? I would like to see more discussion and justification of the VF results and interpretations.

      In the light of such a Reviewer’s observation and according to the Reviewer #2 overall comment on the procedures implemented for filtering VolcanoFinder results, we realized that both normalization of  LR scores and the use of a sliding windows approach might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods, page 13 lines 4 -16 for further details).

      By following this approach, we indeed observed a pattern clearer than that previously described, in which the distribution of LR scores in the EPAS1 genomic region is remarkably different with respect to that obtained for the EGLN1 gene (Figure 2 – figure supplement 1). More in detail, we identified a total of 19 EPAS1 variants showing scores within the top 5% of LR values, in contrast to only three EGLN1 SNVs. Moreover, LR values were collectively more aggregated in the EPAS1 genomic region and showed a higher average value with respect to what observed for EGLN1. We reported LR values, as well as -log (a) scores calculated for these control genes in Supplement tables 3 and 4.

      Nevertheless, we agree with the Reviewer that results pointed out by VolcanoFinder require to be confirmed by additional methods, which is was what we have done to define both new candidate adaptive intregressed loci and the considered positive/negative controls. In fact, validation analyses performed to confirm signatures of both archaic introgression and adaptive evolution (i.e., Signet, LASSI and Haplostrips) converged in indicating that Tibetan variability at the EGLN1 gene does not seem to have been shaped by archaic introgression events but only by the action of natural selection (see Results, page 5 lines 3-9, page 6 lines 23-25, page 7 lines 29-36; Discussion page 14 lines 33-36; Figure 2 – figure supplement 1B and Figure 4 – figure supplement 1B, 3B and 3D), also according to what was previously proposed (Hu et al., 2017). On the other hand, results from all validation analyses confirmed adaptive introgression signatures at the EPAS1 genomic region (see Results page 4 lines 32-37, page 5 lines 1-2 and 30-34, page 6 lines 23-29; Figure 3A, 3B and Figure 4 – figure supplement 1A, 3A and 3C). 

      Finally, as already reported in the former version of the manuscript, our choice of considering EPAS1 and EGLN1 respectively as positive and negative controls for adaptive introgression was guided by previous evidence suggesting these loci as targets of natural selection in high-altitude Himalayan populations (Yang et al., 2017; Liu et al., 2022), although only EPAS1 was proved to have been involved also in an adaptive introgression event (Huerta-Sanchez et al., 2014; Hu et al., 2017). 

      With that being said, I suggest the authors try to first validate the signal of positive selection in the two gene regions using methods such as H2/H1 (Garud et al. 2015), iHS (Voight et al. 2006) etc. that have demonstrated power and success at detecting mild sweeps and soft sweeps, regardless of if these are adaptive introgression.

      According to the Reviewer’s suggestion, we validated the new candidate adaptive introgressed loci by using also a method to formally test for the action of natural selection. In particular, we decided to use the LASSI (Likelihood-based Approach for Selective Sweep Inference) algorithm developed by Harris & DeGiorgio (2020) mainly for the following reasons: i) it is able to identify both strong and weak genomic signatures of positive selection similarly to others approaches, but additionally it can distinguish these signals by explicitly classifying genomic windows affected by hard or soft selective sweeps; ii) when applied on simulated data generated under different demographic models and by setting a range of different values for the parameters that describe a selective event (e.g., the time at which the beneficial mutation arose, the selection coefficient s) it has been proved to have an increased power with respect to traditional selection scans, such as nSL, H2/H1 and H12 (see Harris & DeGiorgio 2020 for further details).  

      According to such an approach, we were able to recapitulate signatures of natural selection previously observed in Tibetans for both EPAS1 and EGLN1 (Figure 4 – figure supplement 1 and 3C – 3D).  We also obtained comparable patterns for our previous candidate adaptive introgressed loci (i.e., EP300 and NOS2), as well as for the new ones that have been instead prioritized in the revised version of the manuscript according to consistent results also from VolcanoFinder, Signet and Haplostrips analyses (see Results, page 6 lines 30-35; Figure 4C, 4D, Figure 4 – figure supplement 2C and 2D).    

      With regard to the plausible archaic origin of the haplotypes under selection in these gene regions, my concern comes from the fact that other recent studies characterizing the archaic ancestry landscape in Tibetans and East Asians (eg. SPrime reports from Browning et al. 2018, as well as ArchaicSeeker reports from Yuan et al. 2021) didn't report archaic segments in regions overlapping with EP300 and NOS2. So how would the authors explain the discrepancy here, that adaptive introgression is detected yet there is little evidence of archaic segments in the regions? 

      We thank the Reviewer for the comment and the references provided. However, we read the suggested articles and in both of them it does not seem that genomes from individuals of Tibetan ancestry have been analysed. Moreover, in the study by Yuan et al. 2021 we were not able to find any table or supplementary table reporting the genomic segments showing signatures of Denisovan-like introgression in East Asian groups, with only findings from enrichment analyses performed on significant results being described for the Papuan population. Anyway, as reported below in the response to comment #5, in line with what observed by the Reviwer as concerns the original version of the manuscript, according to the additional validation analyses implemented during this revison EP300 and NOS2 received lower prioritization with respect to other loci showing more robust signatures supporting introgression of Denisovan alleles in the gene pool of Tibetan ancestors (i.e., TBC1D1, PRKAG2, KRAS and RASGRF2). Three out of four of these genes are in accordance also with previously published results supporting introgression of Denisovan alleles in the ancestors of present-day Han Chinese (Browning et al. 2018) or directly in the Tibetan genomes (Hu et al. 2017) (see Results, page 5 lines 10-21 and Supplement table 5). Despite that, the reason why not all the candidate adaptive introgression regions detected by our analyses are found among results from Browning et al. 2018 can be represented by the fact that in Han Chinese this archaic variation could have evolved neutrally after the introgression events, thus preventing the identification of chromosomal segments enriched in putative archaic introgressed variants according to VolcanoFinder and LASSI approaches (which consider also the impact of natural selection). In fact, the Sprime method implemented by Browning et al. 2018 focuses only on introgression events rather than adaptive introgression ones. For instance, the Denisovan-like regions identified with Sprime in Han Chinese by such a study do not comprise at all the EPAS1 region. 

      Additionally, looking at Figure 4 and Supplementary Figure 4, the authors showed haplotype comparisons between Tibetans, Denisovan, and Han Chinese for EP300 and NOS2 regions. However, in both figures, there are about equal number of Tibetans and Han Chinese that harbor the haplotype with somewhat close distance to the Denisovan genotype. And this closest haplotype is not even that similar to the Denisovan. So how would the authors rule out the possibility that instead of adaptive introgression, the selection was acting on just an ancestral modern human haplotype?

      We agree with the Reviewer that according to the analyses presented in the original version of the manuscript haplotype patterns observed at EP300 and NOS2 loci by means of the Haplostrips approach cannot ruled out the possibility that their adaptative evolution involved ancestral modern human haplotypes. In fact, after the modifications implemented in the adopted pipeline of analyses based on the Reviewers’ suggestions, their role in modulating complex adaptations to high-altitudes was confirmed also by results obtained with the LASSI algorithm (in addition to results from previous studies Bigham et al., 2010; Zheng et al., 2017; Deng et al., 2019; X. Zhang et al., 2020), but their putative archaic origin received lower prioritization with respect to other loci, being not confirmed by all the analyses performed.

      Furthermore, I have a question about how exactly the authors scored the genes in their network analysis using Signet. The manuscript mentioned they were looking for enrichment of archaic-like derived alleles, and in the methods section, they mentioned they used SNPs that are present in both Denisovan and Tibetan genomes but are not in the chimp ancestral allele state. But are these "derived" alleles also present in Han Chinese or Africans? If so, what are the frequencies? And if the authors didn't use derived alleles exclusively shared between Tibetans and Denisovans, that may lead to false positives of the enrichment analysis, as the result would not be able to rule out the selection on ancestral modern human variation.

      As mentioned in the response to comment #1, by following the suggestions of both the Reviewers we have modified the criteria adopted for filtering archaic derived variants exclusively shared between Denisovans and Tibetans. In particular, we retained as input for Signet analysis only those alleles that i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles) ii) were in their derived state and iii) were completely absent (i.e., show frequency equal to zero) in the Yoruba population sequenced by the 1000 Genome Project and used here as an outgroup by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. We instead decided to do not filter out potential Denisovan-like derived alleles present also in the Han Chinese population because multiple evidence agreed at indicating that gene flow from Denisovans occurred in the ancestral East Asian gene pool no sooner than 48–46 thousand years ago (Teixeira et al. 2019; Zhang et al. 2021; Yuan et al. 2021), thus predating the split between low-altitude and high-altitude groups, which occurred approximately 15 thousand years ago (Lu et al. 2016; Hu et al. 2017). In fact, traces of such an archaic gene-flow are still detectable in the genomes of several low-altitude populations of East Asian ancestry (Yuan et al. 2021).

      Concerning the above, I would also suggest the authors replot their Figure 4 and Figure S4 by adding the African population (eg. YRI) in the plot, and examine the genetic distance among the modern human haplotypes, in contrast to their distance to Denisovan.

      According to the Reviewer’s suggestion, after having identified new candidate adaptive introgressed loci according to the revised pipeline of analyses, we run the Haplostrips algorithm by including in the dataset 27 individuals (i.e., 54 haplotypes) from the Yoruba population sequenced by the 1000 Genomes Project (Figure 4A, 4B, Figure 4 - figure supplement 2A, 2B, 3A).

      Reviewer #2:

      In the methods the authors write "Since composite likelihood statistics are not associated with pvalues, we implemented multiple procedures to filter SNVs according to the significance of their LR values." What does significance mean here?

      After modifications applied to the adopted pipeline of analyses according to the Reviewers’ suggestions (see responses to public reviews and to comments #1, #3, #6, #7 of Reviewer #1), new candidate adaptive introgressed loci have been identified specifically by focusing on variants showing LR values falling in the top 5% of the genomic distribution obtained for such a statistic in order to adhere more strictly to the VolcanoFinder approach developed by Setter et al. 2020. Therefore, the related sentence in the materials and methods section was modified accordingly.

      Signet should be cited the first time it appears in the manuscript. The citation in the references is wrong. It lists R. Nielsen as the last author, but R. Nielsen is not an author of this paper.

      We thank the Reviewer for the comment. We have now mentioned the article by Gouy and Excoffier (2020) in the Results section where the Signet algorithm was first described and we have corrected the related reference.

      I could not find Figure 5 which is cited in the methods in the main text. I assume the authors mean Supplementary Figure 5, but the supplementary files have Figure 4.

      We thank the Reviewer for the comment. We have checked and modified figures included in the article and in the supplementary files to fix this issue.

      I didn't see a table with the genes identified as adaptatively introgressed with VolcanoFinder. This would be useful as I believe this is the first time VolcanoFinder is being used on Tibetan data?

      According to the Reviewer suggestion, we have reported in Supplement table 2 all the variants showing LR scores falling in the top 5% of the genomic distribution obtained for such a statistic, along with the associated α parameters computed by the VolcanoFinder algorithm.

      It is easier for the reviewer if lines have numbers.

      According to the Reviewer suggestion, we have included line numbers in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1

      Summary:

      The authors introduce a denoising-style model that incorporates both structure and primary-sequence embeddings to generate richer embeddings of peptides. My understanding is that the authors use ESM for the primary sequence embeddings, take resolved structures (or use structural predictions from AlphaFold when they're not available), and then develop an architecture to combine these two with a loss that seems reminiscent of diffusion models or masked language model approaches. The embeddings can be viewed as ensemble-style embedding of the two levels of sequence information, or with AlphaFold, an ensemble of two methods (ESM+AlphaFold). The authors also gather external datasets to evaluate their approach and compare it to previous approaches. The approach seems promising and appears to out-compete previous methods at several tasks. Nonetheless, I have strong concerns about a lack of verbosity as well as the exclusion of relevant methods and references.

      Thank you for the comprehensive summary. Regarding the concerns listed in the review below, we have made point-to-point response. We also modified our manuscript in accordance. 

      Advances:

      I appreciate the breadth of the analysis and comparisons to other methods. The authors separate tasks, models, and sizes of models in an intuitive, easy-to-read fashion that I find valuable for selecting a method for embedding peptides. Moreover, the authors gather two datasets for evaluating embeddings' utility for predicting thermostability. Overall, the work should be helpful for the field as more groups choose methods/pretraining strategies amenable to their goals, and can do so in an evidence-guided manner.

      Thank you for recognizing the strength of our work in terms of the notable contributions, the solid analysis, and the clear presentation.

      Considerations:

      (1) Primarily, a majority of the results and conclusions (e.g., Table 3) are reached using data and methods from ProteinGym, yet the best-performing methods on ProteinGym are excluded from the paper (e.g., EVEbased models and GEMME). In the ProteinGym database, these methods outperform ProtSSN models. Moreover, these models were published over a year---or even 4 years in the case of GEMME---before ProtSSN, and I do not see justification for their exclusion in the text.

      We decided to exclude the listed methods from the primary table as they are all MSA-based methods, which are considered few-shot methods in deep learning (Rao et al., ICML, 2021). In contrast, the proposed ProtSSN is a zero-shot method that makes inferences based on less information than few-shot methods. Moreover, it is possible for MSA-based methods to query aligned sequences based on predictions. For instance, Tranception (Notin et al., ICML, 2022) selects the model with the optimal proportions of logits and retrieval results according to the average correlation score on ProteinGym (Table 10, Notin et al., 2022).

      With this in mind, we only included zero-shot deep learning methods in Table 3, which require no more than the sequence and structure of the underlying wild-type protein when scoring the mutants. In the revision, we have added the performance of SaProt to Table 3, and the performance of GEMME, TranceptEVE, and SaProt to Table 5. Furthermore, we have released the model's performance on the public leaderboard of ProteinGym v1 at proteingym.org.

      (2) Secondly, related to the comparison of other models, there is no section in the methods about how other models were used, or how their scores were computed. When comparing these models, I think it's crucial that there are explicit derivations or explanations for the exact task used for scoring each method. In other words, if the pre-training is indeed an important advance of the paper, the paper needs to show this more explicitly by explaining exactly which components of the model (and previous models) are used for evaluation. Are the authors extracting the final hidden layer representations of the model, treating these as features, and then using these features in a regression task to predict fitness/thermostability/DDG etc.? How are the model embeddings of other methods being used, since, for example, many of these methods output a k-dimensional embedding of a given sequence, rather than one single score that can be correlated with some fitness/functional metric? Summarily, I think the text lacks an explicit mention of how these embeddings are being summarized or used, as well as how this compares to the model presented.

      Thank you for the suggestion. Below we address the questions in three points. 

      (1) The task and the scoring for each method. We followed your suggestion and added a new paragraph titled “Scoring Function” on page 9 to provide a detailed explanation of the scoring functions used by other deep learning zero-shot methods.

      (2) The importance of individual pre-training modules. The complete architecture of the proposed ProtSSN model has been introduced on page 7-8. Empirically, the influence of each pre-training module on the overall performance has been examined through ablation studies on page 12. In summary, the optimal performance is achieved by combining all the individual modules and designs.

      (3) The input of fitness scoring. For a zero-shot prediction task, the final score for a mutant will be calculated by wildly-used functions named log-odds ratio (for encoder models, including ours) or loglikelihood (for autoregressive models or inverse folding models. In the revision, we explicitly define these functions in sections “Inferencing” (page 7) and “Scoring Function” (page 9). 

      (3) I think the above issues can mainly be addressed by considering and incorporating points from Li et al. 2024[1] and potentially Tang & Koo 2024[2]. Li et al.[1] make extremely explicit the use of pretraining for downstream prediction tasks. Moreover, they benchmark pretraining strategies explicitly on thermostability (one of the main considerations in the submitted manuscript), yet there is no mention of this work nor the dataset used (FLIP (Dallago et al., 2021)) in this current work. I think a reference and discussion of [1] is critical, and I would also like to see comparisons in line with [1], as [1] is very clear about what features from pretraining are used, and how. If the comparisons with previous methods were done in this fashion, this level of detail needs to be included in the text.

      The initial version did not include an explicit comparison with the mentioned reference due to the difference in the learning task. In particular, [1] formulates a supervised learning task on predicting the continuous scores of mutants of specific proteins. In comparison, we make zero-shot predictions, where the model is trained in a self-supervised learning manner that requires no labels from experiments. In the revision, we added discussions in “Discussion and Conclusion” (lines 476-484):

      Recommendations For The Authors:

      Comment 1

      I found the methods lacking in the sense that there is never a simple, explicit statement about what is the exact input and output of the model. What are the components of the input that are required by the user (to generate) or supply to the model? Are these inputs different at training vs inference time? The loss function seems like it's trying to de-noise a modified sequence, can you make this more explicit, i.e. exactly what values/objects are being compared in the loss?

      We have added a more detailed description in the "Model Pipeline" section (page 7), which explains the distinct input requirements for training and inference, as well as the formulation of the employed loss function. To summarize:

      (1) Both sequence and structure information are used in training and inference. Specifically, structure information is represented as a 3D graph with coordinates, while sequence information consists of AA-wise hidden representations encoded by ESM2-650M. During inference, instead of encoding each mutant individually, the model encodes the WT protein and uses the output probability scores relevant to the mutant to calculate the fitness score. This is a standard operation in many zero-shot fitness prediction models, commonly referred to as the log-odds-ratio.

      (2) The loss function compares the differences between the noisy input sequence and the output (recovered) AA sequence. Noise is added to the input sequences, and the model is trained to denoise them (see “Ablation Study” for the different types of noise we tested). This approach is similar to a one-step diffusion process or BERT-style token permutation. The model learns to recover the probability of each node (AA) being one of 33 tokens. A cross-entropy loss is then applied to compare this distribution with the ground-truth (unpermuted) AA sequence, aiming to minimize the difference.

      To better present the workflow, we revised the manuscript accordingly.

      Comment 2

      Related to the above, I'm not exactly sure where the structural/tertiary structure information comes from. In the methods, they don't state exactly whether the 3D coordinates are given in the CATH repository or where exactly they come from. In the results section they mention using AlphaFold to obtain coordinates for a specific task---is the use of AlphaFold limited only to these tasks/this is to show robustness whether using AlphaFold or realized coordinates?

      The 3D coordinates of all proteins in the training set are derived from the crystal structures in CATH v4.3.0 to ensure a high-quality input dataset (see "Training Setup," Page 8). However, during the inference phase, we used predicted structures from AlphaFold2 and ESMFold as substitutes. This approach enhances the generalizability of our method, as in real-world scenarios, the crystal structure of the template protein to be engineered is not always available. The associated descriptions can be found in “Training Setup” (lines 271-272) and “Folding Methods” (lines 429-435).

      Comment 3

      Lines 142+144 missing reference "Section establishes", "provided in Section ."

      199 "see Section " missing reference

      214 missing "Section"

      Thank you for pointing this out. We have fixed all missing references in the revision.

      Comment 4

      Table 2 - seems inconsistent to mention the number of parameters in the first 2 methods, then not in the others (though I see in Table 3 this is included, so maybe should just be omitted in Table 2).

      In Table 2, we present the zero-shot methods used as baselines. Since many methods have different versions due to varying hyperparameter settings, we decided to list the number of parameters in the following tables.

      We have double-checked both Table 3 and Table 5 and confirm that there is no inconsistency in the reported number of parameters. One potential explanation for the observed difference in the comment could be due to the differences in the number of parameters between single and ensemble methods. The ensemble method averages the predictions of multiple models, and we sum the total number of parameters across all models involved. For example, RITA-ensemble has 2210M parameters, derived from the sum of four individual models with 30M, 300M, 680M, and 1200M parameters.

      Comment 5

      In general, I found using the word "type" instead of "residue" a bit unnatural. As far as I can tell, the norm in the field is to say "amino acid" or "residue" rather than "type". This somewhat confused me when trying to understand the methods section, especially when talking about injecting noise (I figured "type" may refer to evolutionarily-close, or physicochemically-close residues). Maybe it's not necessary to change this in every instance, but something to consider in terms of ease of reading.

      Thank you for your suggestion. The term "type" we used is a common expression similar to "class" in the NLP field. To avoid further confusion to the biologists, we have revised the manuscript accordingly. 

      Comment 6

      197 should this read "based on the kNN "algorithm"" (word missing) or maybe "based on "its" kNN"?

      We have corrected the typo accordingly. It now reads “the 𝑘-nearest neighbor algorithm (𝑘NN)” (line 198).

      Comment 7

      200 weights of dimension 93, where does this number come from?

      The edge features are derived by Zhou et al., 2024. We have updated the reference in the manuscript for clarity (lines 201-202).

      Comment 8

      210-212 "representations of the noisy AA sequence are encoded from the noisy input" what is the "noisy AA sequence?" might be helpful to exactly defined what is "noisy input" or "noisy AA sequence". This sentence could potentially be worded to make it clearer, e.g. "we take the modified input sequence and embed it using [xyz]."

      We have revised the text accordingly. In the revised see lines 211-212:

      Comment 9

      In Table 3

      Formatting, DTm (million), (million) should be under "# Params" likely?

      Also for DDG this is reported on only a few hundred mutations, it might be worth plotting the confidence intervals over the Spearman correlation (e.g. by bootstrapping the correlation coefficient).

      We followed the suggestion and added “million” under the "# Params". We have added the bootstrapped results for DDG and DTm to Table 6. For each dataset, we randomly sampled 50% of the data for ten independent runs. ProtSSN achieves the top performance with a considerably small variance.

      Comment 10

      The paragraph in lines 319 to lines 328 I feel may lack sufficient evidence.

      "While sequence-based analysis cannot entirely replace the role of structure-based analysis, compared to a fully structure-based deep learning method, a protein language model is more likely to capture sufficient information from sequences by increasing the model scale, i.e., the number of trainable parameters."

      This claim is made without a citation, such as [1]. Increasing the scale of the model doesn't always align with improving out-of-sample/generalization performance. I don't feel fully convinced by the claim that worse prediction is ameliorated by increasing the number of parameters. In Table 3 the performance is not monotonic with (nor scales with) the number of parameters, even within a model. See ProGen2 Expression scores, or ESM-2 Stability scores, as a function of their model sizes. In [1], the authors discuss whether pretraining strategies are aligned with specific tasks. I think rewording this paragraph and mentioning this paper is important. Figure 3 shows that maybe there's some evidence for this but I don't feel entirely convinced by the plot.

      We agree that increasing the number of learnable parameters does not always result in better performance in downstream tasks. However, what we intended to convey is that language models typically need to scale up in size to capture the interactions among residues, while structure-based models can achieve this more efficiently with lower computational costs. We have rephrased this paragraph in the paper to clarify our point in lines 340-342.

      Comment 11

      Line 327 related to my major comment, " a comprehensive framework, such as ProtSSN, exhibits the best performance." Refers to performance on ProteinGym, yet the best-performing methods on ProteinGym are excluded from the comparison.

      The primary comparisons were conducted using zero-shot models for fairness, meaning that the baseline models were not trained on MSA and did not use test performance to tune their hyperparameters. It's also worth noting that SaProt (the current SOTA model) had not been updated on the leaderboard at the time of submitting this paper. In the revised manuscript, we have included GEMME and TranceptEVE in Table 5 and SaProt in Tables 3, 5, and 6. While ProtSSN does not achieve SOTA performance in every individual task, our key argument in the analysis is to highlight the overall advantage of hybrid encoders compared to single sequence-based or structure-based models. We made clearer statement in the revised manuscript (line 349):

      Comment 12

      Line 347, line abruptly ends "equivariance when embedding protein geometry significantly." (?).

      We have fixed the typo, (lines 372-373): 

      Comment 13

      Figure 3 I think can be made clearer. Instead of using True/false maybe be more explicit. For example in 3b, say something like "One-hot encoded" or "ESM-2 embedded".

      The labels were set to True/False with the title of the subfigures so that they can be colored consistently.

      Following the suggestion, we have updated the captions in the revised manuscript for clarity.

      Comment 14

      Lines 381-382 "average sequential embedding of all other Glycines" is to say that the score is taken as the average score in which Glycine is substituted at every other position in the peptide? Somewhat confused by the language "average sequential embedding" and think rephrasing could be done to make things clearer.

      We have revised the related text accordingly a for clearer presentation (lines 406-413). 

      Comment 15

      Table 5, and in mentions to VEP, if ProtSSN is leveraging AlphaFold for its structural information, I disagree that ProtSSN is not an MSA method, and I find it unfair to place ProtSSN in the "non-MSA" categories. If this isn't the case, then maybe making clearer the inputs etc. in the Methods will help.

      Your response is well-articulated and clear, but here is a slight revision for improved clarity and flow:

      We respectfully disagree with classifying a protein encoding method based solely on its input structure. While AF2 leverages MSA sequences to predict protein structures, this information is not used in our model, and our model is not exclusive to AF2-predicted structures. When applicable, the model can encode structures derived from experimental data or other folding methods. For example, in the manuscript, we compared the performance of ProtSSN using proteins folded by both AF2 and ESMFold.

      However, we would like to emphasize that comparing the sensitivity of an encoding method across different structures or conformations is not the primary focus of our work. In contrast, some methods explicitly use MSA during model training. For instance, MSA-Transformer encodes MSA information directly into the protein embedding, and Tranception-retrieval utilizes different sets of MSA hyperparameters depending on the validation set's performance.

      To avoid further confusion, we have revised the terms "MSA methods" and "non-MSA methods" in the manuscript to "zero-shot methods" and "few-shot methods."

      Comment 16

      Table 3 they're highlighted as the best, yet on ProteinGym there's several EVE models that do better as well as GEMMA, which are not referenced.

      The comparison in Table 3 focuses on zero-shot methods, whereas GEMME and EVE are few-shot models. Since these methods have different input requirements, directly comparing them could lead to

      unfair conclusions. For this reason, we reserved the comparisons with these few-shot models for Table 5, where we aim to provide a more comprehensive evaluation of all available methods.            

      Response to Reviewer 2

      Summary:

      To design proteins and predict disease, we want to predict the effects of mutations on the function of a protein. To make these predictions, biologists have long turned to statistical models that learn patterns that are conserved across evolution. There is potential to improve our predictions however by incorporating structure. In this paper, the authors build a denoising auto-encoder model that incorporates sequence and structure to predict mutation effects. The model is trained to predict the sequence of a protein given its perturbed sequence and structure. The authors demonstrate that this model is able to predict the effects of mutations better than sequence-only models.

      Thank you for your thorough review and clear summary of our work. Below, we provide a detailed, pointby-point response to each of your questions and concerns. 

      Strengths:

      The authors describe a method that makes accurate mutation effect predictions by informing its predictions with structure.

      Thank you for your clear summary of our highlights.

      Weaknesses:

      Comment 1

      It is unclear how this model compares to other methods of incorporating structure into models of biological sequences, most notably SaProt.

      (https://www.biorxiv.org/content/10.1101/2023.10.01.560349v1.full.pdf).

      In the revision, we have updated the performance of SaProt single models (with both masked and unmasked versions with the pLDDT score) and ensemble models in the Tables 3, 5, and 6.

      In the revised manuscript, we have updated the performance results for SaProt's single models (both masked and unmasked versions with the pLDDT score) as well as the ensemble models. These updates are reflected in Tables 3, 5, and 6.

      Comment 2

      ProteinGym is largely made of deep mutational scans, which measure the effect of every mutation on a protein. These new benchmarks contain on average measurements of less than a percent of all possible point mutations of their respective proteins. It is unclear what sorts of protein regions these mutations are more likely to lie in; therefore it is challenging to make conclusions about what a model has necessarily learned based on its score on this benchmark. For example, several assays in this new benchmark seem to be similar to each other, such as four assays on ubiquitin performed at pH 2.25 to pH 3.0.

      We agree that both DTm and DDG are smaller datasets, making them less comprehensive than ProteinGym. However, we believe DTm and DDG provide valuable supplementary insights for the following reasons:

      (1) These two datasets are low-throughput and manually curated. Compared to datasets from highthroughput experiments like ProteinGym, they contain fewer errors from experimental sources and data processing, offering cleaner and more reliable data.

      (2) Environmental factors are crucial for the function and properties of enzymes, which is a significant concern for many biologists when discussing enzymatic functions. Existing benchmarks like ProteinGym tend to simplify these factors and focus more on global protein characteristics (e.g., AA sequence), overlooking the influence of environmental conditions.

      (3) While low-throughput datasets like DTm and DDG do not cover all AA positions or perform extensive saturation mutagenesis, these experiments often target mutations at sites with higher potential for positive outcomes, guided by prior knowledge. As a result, the positive-to-negative ratio is more meaningful than random mutagenesis datasets, making these benchmarks more relevant for evaluating model performance.

      We would like to emphasize that DTm and DDG are designed to complement existing benchmarks rather than replace ProteinGym. They address different scales and levels of detail in fitness prediction, and their inclusion allows for a more comprehensive evaluation of deep learning models.

      Recommendations For The Authors:

      Comment 1

      I recommend including SaProt in your benchmarks.

      In the revision, we added comparisons with SaProt in all the Tables (3, 5 and 6). 

      Comment 2

      I also recommend investigating and giving a description of the bias in these new datasets.

      The bias of the new benchmarks could be found in Table 1, where the mutants are distributed evenly at different level of pH values.

      In the revision, we added a discussion regarding the new datasets in “Discussion and Conclusion” (lines 496-504 of the revised version).

      Comment 3

      I also recommend reporting the model's ability to predict disease using ClinVar -- this experiment is conspicuously absent.

      Following the suggestion, we retrieved 2,525 samples from the ClinVar dataset available on ProteinGym’s website. Since the official source did not provide corresponding structure files, we performed the following three steps:

      (1) We retrieved the UniProt IDs for the sequences from the UniProt website and downloaded the corresponding AlphaFold2 structures for 2,302 samples.

      (2) For the remaining proteins, we used ColabFold 1.5.5 to perform structure prediction.

      (3) Among these, 12 proteins were too long to be folded by ColabFold, for which we used the AlphaFold3 server for prediction.

      All processed structural data can be found at https://huggingface.co/datasets/tyang816/ClinVar_PDB. Our test results are provided in the following table. ProtSSN achieves the top performance over baseline methods.

      Author response table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. Utilizing single-nucleus RNA sequencing (snRNA-seq), the study explores how CLA supplementation alters cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles.

      Thanks!

      Strengths:

      Innovative approach: The use of snRNA-seq provides a high-resolution insight into the cellular heterogeneity of pig skeletal muscle, enhancing our understanding of the intricate cellular dynamics influenced by nutritional regulation strategy.

      Robust validation: The study utilizes multiple pig models, including Heigai and Laiwu pigs, to validate the differentiation trajectories of adipocytes and the effects of CLA on muscle fiber type transformation. The reproducibility of these findings across different (nutritional vs genetic) models enhances the reliability of the results.

      Advanced data analysis: The integration of pseudotemporal trajectory analysis and cell-cell communication analysis allows for a comprehensive understanding of the functional implications of the cellular changes observed.

      Practical relevance: The findings have significant implications for improving meat quality, which is valuable for both the agricultural and food industry.

      Thanks!

      Weaknesses:

      Model generalizability: While pigs are excellent models for human physiology, the translation of these findings to human health, especially in diverse populations, needs careful consideration.

      Thanks!

      Reviewer #2 (Public Review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Thanks!

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Thanks!

      Weaknesses:

      While the authors generated a sizeable comprehensive dataset, cellular and molecular validation needed to be improved. For example, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, yet these data are not validated by other methodologies. Similarly, the authors suggest that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, no cellular and molecular analysis was performed to reveal if these trajectories indeed apply. Attempts to identify JNK signaling pathways appear superficial and do not delve deeper into mechanistic action or transcriptional regulation. Notably, a variety of single cell studies have been performed on mouse/human skeletal muscle and adipose tissues. Yet, the authors need to discuss how the populations they have identified support the existing literature on cell-type populations in skeletal muscle.Moreover, the authors nicely incorporate the two pig models into their results, but the authors only examine one muscle group. It would be interesting if other muscle groups respond similarly or differently in response to linoleic acid supplementation.Further, it was unclear whether Heigai and Laiwu pigs were both fed conjugated linoleic acid or whether the comparison between Heigai-fed linoleic acid and Laiwu pigs (as a model of high intramuscular fat). With this in mind, the authors do not discuss how their results could be implicated in human and pig nutrition, such as desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets. Moreover, the experimental designs do not denote the conjugated linoleic acid supplementation duration. Several immunostainings performed could be quantified to validate statements. This reviewer also found the Nile Red staining hard to interpret visually and did not appear to support the conclusions convincingly. Within Figure 7, several letters (assuming they represent statistical significance) are present on the graphs but are not denoted within the figure legend.

      Thanks for your suggestions! We accepted your suggestion to revised our manuscript.

      For changes in myofiber type, we performed qPCR to verify the changes of muscle fiber type related gene expression after CLA treatment (Figure 2E); for changes of adipocyte and preadipocyte populations, we also performed immunofluorescence staining, qPCR, and western blotting in LDM tissues and FAPs to verify the alterations of cell types after feeding with CLA (Figure 3D, 3E, 6G, 7C, and 7D). Hence, we think these cellular and molecular results could support our conclusions.

      For JNK signaling pathway, we selected this signaling pathway based on snRNA-seq dataset and verified by activator in vitro experiment. However, we did not explore the mechanistic action and the downstream transcriptional regulators need to be further discussed. We have added these in the discussion part (line 443-448).

      We have added the comparation between different cell-type populations in skeletal muscles (line 362-368 and 385-390).

      For changes in myofiber type of Laiwu pigs, we have discussed in our previous study(Wang et al., 2023). Interestingly, we also found in high IMF content Laiwu pigs, the percentage of type IIa myofibers had an increased tendency (29.37% vs. 23.95%) while the percentage of type IIb myofibers had a decreased tendency (38.56% vs. 43.75%) in this study. We also added this discussion in the discussion part (line 392-395).

      We have supplied the information of treatment in the materials and methods part (line 469-478). We also added the discussion about significance of our study for human and pig nutrition in the discussion part (line 375-376 and 446-447).

      Our data will be made available on reasonable request (line 574-576).

      We have supplied the information of the CLA supplementation duration in the materials and methods part (line 465).

      Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A). In Figure 7, the Nile Red staining could be quantified and we have the quantification of Oil Red O staining (Figure 7B and 7J). We also added the statistical significance in figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses

      Cross-species analysis: To strengthen the generalizability of the results, it would be beneficial to include a comparative analysis with other species, such as human, bovine, or rodent models, using publicly available snRNA-seq datasets.

      Thanks! Our previous study has compared the conserved and unique signatures in fatty skeletal muscles between different species(Wang, Zhou, Wang, & Shan, 2024). We mainly focused on the regulatory mechanism of CLAs in regulating intramuscular fat deposition. However, there is still a blank in the snRNA-seq or scRNA-seq datasets about the effects of CLAs on regulating fat deposition in muscles across other species, including human, bovine or rodent models. Hence, we only analyze the regulatory mechanisms of CLAs influencing intramuscular fat deposition in pigs.

      Functional link: the authors should discuss in the manuscript how the muscles differ in terms of texture, flavor, aroma, etc. before and after CLA administration or between Heigai and Laiwu to provide context and help readers better understand how the observed high-resolution cellular changes relate to these functional properties of meat.

      Thanks! We have added these in the introduction part (line 90-98).

      Improve figures: some figures, particularly those involving Oil Red O and Nail Red, could be improved by including higher magnification images to assess the organization of lipid droplets of individual adipocytes (Figure 7A, I, and K).

      Thanks! Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A).

      Reviewer #2 (Recommendations For The Authors):

      All of my comments are above. However, I would recommend improving the writing as several areas throughout the results needed clarity.

      Thanks! We have revised our manuscript carefully after accepting your revisions.

      Wang, L., Zhao, X., Liu, S., You, W., Huang, Y., Zhou, Y., . . . Shan, T. (2023) Single-nucleus and bulk RNA sequencing reveal cellular and transcriptional mechanisms underlying lipid dynamics in high marbled pork NPJ Sci Food 7: 23. https://doi.org/10.1038/s41538-023-00203-4

      Wang, L., Zhou, Y., Wang, Y., & Shan, T. (2024) Integrative cross-species analysis reveals conserved and unique signatures in fatty skeletal muscles Sci Data 11: 290. https://doi.org/10.1038/s41597-024-03114-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

      We thank Reviewer 1 for their thoughtful review and commentary.  We appreciate the reviewer’s finding that our “claims and conclusions are supported by the presented data.”   

      We note that our findings on the temporal progression of transcriptional changes between P42 and P75 apply to both the Pol II and Pol III transcriptomes. Importantly, in the case of Pol III, only precursor and mature tRNAs are affected at P42 whereas at P75, numerous other Pol III transcripts are also changed.  We therefore attribute the changes in tRNA as being causal in disease initiation since this is the earliest  direct consequence of the Polr3a mutation.

      To expand on the evidence demonstrating the progressive nature of Polr3-related disease in our mouse model, the revised manuscript includes new immunofluorescence data showing no change in microglial cell density in the cerebral cortex or the striatum at an early stage in the disease (Supplementary Fig. S6F, G).  This is in striking contrast to the findings at later times (P75) where the number of microglia increased significantly in the Polr3a mutant and exhibit an activated morphology (Fig. 4G,H).   

      We agree with the reviewer that it will be interesting in the future to assess the impact of the Polr3a mutation in different neural cell types and to explore opportunities for suppressing disease phenotypes. 

      Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story. My enthusiasm for publication of this article in eLife is dampened considering following reasons mentioned in the weakness.

      Reviewer 2’s summary contains two misstatements: 

      Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs.

      Our experiments document the effect of a neurodegenerative disease-causing mutation in RNA polymerase III on the Pol III transcriptome with a particular focus on the tRNAome (i.e. the mature tRNA population). Experiments on the maturation and transport of tRNA were not performed as there was no indication that these processes might be negatively impacted at the earliest time point (P42). Additional comments about tRNA maturation and export are provided under points 8 and 9 (see below). 

      The study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour.

      This comment misstates the purpose of our study while overlooking the important results. As stated in the abstract, our goal was to develop “a postnatal whole-body mouse model expressing pathogenic Polr3a mutations to examine the molecular mechanisms by which reduced Pol III transcription results primarily in central nervous system phenotypes.”

      Accordingly, our work provides the first molecular analysis of RNA polymerase III transcription in an animal model of Polr3-related disease. The novelty and importance of the findings, as stated in the abstract, include the discovery that a global reduction in tRNA levels (and not other Pol III transcripts) at an early stage in the disease precedes the frank induction of integrated stress and innate immune responses, activation of microglia and neuronal loss at later times. These later events readily account for the observed neurobehavioral deficits that collectively include risk assessment, locomotor, exploratory and grooming behaviors. 

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      (1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      We have modified the abstract to more clearly frame the objective of the study and its importance as reflected in the title “Molecular basis of neurodegeneration in a mouse model of Polr3-related disease”. We hope the reviewer will agree that the fourth sentence of the abstract, unchanged from the initial submission, clearly outlines the objective of the study.  

      (2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      It is not known how cerebral pathology and exocrine pancreatic atrophy are related beyond their shared Pol III dysfunction in our mouse model of Polr3-related disease. We anticipate that altered tRNA levels connect these two axes. Indeed, the pancreas and the brain are both known to be highly sensitive to perturbations affecting translation (Costa-Mattioli and Walter, 2020 Science doi: 10.1126/science.aat5314). Changes to the tRNA population in the cerebrum and cerebellum of Polr3a mutant mice were extensively documented in the manuscript (e.g. Figs. 3, 5 and 6).  We also found reduced tRNA levels in the pancreas of the mutant mice but did not report these findings due to the absence of a stable reference transcript in total RNA from the atrophied pancreatic tissue, even at the earliest time point examined (P42). 

      (3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      Our study reports the novel finding that a pathogenic Polr3a mutation causes a global reduction in the steady state levels of mature tRNAs, i.e. the levels of all tRNA decoders were reduced with the vast majority these reaching statistical significance (Fig. 6D and 6F). In the introduction we refer to several studies that examined the effect of pathogenic Polr3 mutations on the levels of Pol III-derived transcripts. We noted that these studies examined only a small number of Pol III transcripts in CRISPR-Cas9 engineered cell lines, patient-derived fibroblasts and patient blood. Thus, no study until now has tested for or reported a global defect in the abundance of mature tRNAs in any model of Polr3-related disease. Moreover, no previous study of _Polr3_related disease has analyzed Pol III transcript levels in the brain or in any other tissue. 

      (4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      While we agree with the simple assumption that a “deficit in Pol III transcription likely would affect protein synthesis in all brain areas as well as other organs”, this turned out not to be the case. In fact, a novel finding of our study is that not all Polr3a mutant tissues show a translation stress response despite reduced Pol III transcription and reduced mature tRNA levels. This implies that in some tissues the reduction in tRNA levels caused by the Polr3a mutation is not sufficient to affect protein synthesis, at least to a point where the Integrated Stress Response is induced. The underlying basis for the growth deficit has not been defined in this work. However, we noted in the discussion that a growth defect was previously seen in mice where expression of the Polr3a mutation was restricted to the Olig2 lineage.  In the present postnatal whole-body inducible model, we anticipate that the diminished growth of the mice results from a combination of hormonal and nutritional deficits caused by cerebral and pancreatic dysfunction.

      (5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      We agree that the specific myelin defect observed in the cortex and hippocampus, but not the cerebellum, is an interesting observation. Pol III dysfunction in this model and reduced tRNA levels are common to both cerebra and cerebella, yet the pathological consequences differ between these regions.  While we do not know why this is the case, the cells that oligodendrocytes support in these regions are functionally different. We suggest in the discussion that subtle defects in oligodendrocyte function in the cerebellum may be uncovered using more sensitive or specific assays than the ones we have employed to date.  In addition, consistent with our findings in other tissues where Pol III transcription and tRNA levels are reduced but phenotypes are lacking, we suggest that oligodendrocytes in the cerebellum may have a different minimum threshold for Pol III activity than in other regions of the brain. 

      (6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      We used a behavioral spectrometer with video tracking and pattern-recognition software to quantify ~20 home cage-like behaviors, including locomotor activity, as part of our phenotypic characterization of the mice. This experimenter-unbiased approach reported several metrics of locomotion, specifically, total Track length (the total distance traveled in the instrument), Center Track length and the time spent running (Run Sum) and standing still (Still Sum) in a longitudinal study (Figs. 2A-C and Supplemental Fig. S3A-C). The Materials and Methods section on mouse behavior has been amended to provide a detailed description of these experiments. 

      locomotion is primarily cerebellum dependen_t_

      While we agree that the cerebellum plays a critical role in balance and locomotion, regions of the cerebrum that are affected in our mice, including the primary motor cortex and the basal ganglia (Fig. 4), also have important roles in locomotor activity and control. 

      (7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      The differentially expressed mRNAs identified in our RNAseq analysis at P75 reflect both direct and secondary consequences of dysfunctional Pol III transcription on Pol II transcription. These effects can be achieved by multiple mechanisms. Induction of the Integrated Stress Response (ISR) due to insufficient tRNA can be considered a direct consequence of diminished Pol III transcription on Pol II transcription. An example of a secondary response is the activation of microglia and the innate immune response (which is known to accompany prolonged activation of the ISR), and the loss of neurons and oligodendrocytes. These changes are documented in Figs. 3 and 4. Importantly, loss of neurons, activated microglia and reduced oligodendrocyte numbers are each readily reconciled with changes in behavior.  

      None of these transcripts are very specific for myelination 

      The RNAseq data at P75 indicates only a modest reduction in oligodendrocyte-specific gene expression (as defined by single-cell RNAseq studies of purified cell populations, Mackenzie et al., 2018 Sci. Rep. doi: 10.1038/s41598-018-27293-5). Despite this, some oligodendrocytespecific transcripts with well-known roles in myelination were down-regulated in the Polr3a mutant (e.g. Plp1, Mog and Mobp). In addition, steroid synthesis pathway transcripts involved in the production of cholesterol, an abundant and essential component of myelin, were also downregulated (Supplementary Fig. S4E).

      (8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      Of the many proteins involved in the maturation of tRNA (Phizicky and Hopper, 2023 RNA doi: 10.1261/rna.079620.123), RNAseq analysis at P75 identified only amino-acyl tRNA synthetases as being differentially-expressed (fold change >1.5, p adj. < 0.05, Table S1). These genes are canonical indicators of the ATF4-dependent Integrated Stress Response and their upregulation is widely interpreted as an attempt to restore efficient translation. In addition, our analysis of Pol III transcripts at P75 identified a reduction in the level of RppH1 (Fig. 3C), the RNA component of RNase P, which removes the 5’ leader of precursor tRNAs.  However, at P42, there was no effect on RppH1 abundance, or the expression of amino-acyl tRNA synthetase genes (Fig. 5C and Table S3).  Thus, an RNAi study to identify and analyze a possible factor involved in the maturation of tRNA is neither warranted nor relevant to the current body of work.

      (9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      Our analysis of tRNA populations in this study employed total cellular RNA and thus reflect the abundance of mature tRNA from all cellular compartments. We have not assessed whether the reduction in tRNA abundance caused by the Polr3a mutation alters the dynamics of tRNA transport from the nucleus to the cytoplasm. However, we consider it highly unlikely that the Polr3a mutation would have a significant effect on cytoplasmic transport of tRNA. Imaging experiments along these lines are beyond the scope of the current study.

      (10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

      It is not known whether the reduced tRNA levels affect translation globally in the Polr3a mutant, but we predict that this may not be the case. Since tissues (heart and kidney) and brain regions (cerebrum and cerebellum) that share a decrease in tRNA abundance do not share activation of the Integrated Stress Response (a reporter of aberrant translation), we anticipate that effects on translation may be limited to specific regions or cell populations and to specific mRNAs within these cells. The current study provides the foundation for further work to address these questions.

      Reviewer #1 (Recommendations For The Authors):

      Below are a few comments, mostly regarding typographical errors, presentation, and clarity, that we believe would enhance this manuscript:

      On the heatmaps generated, it would be ideal to place "WT" before "KI," with "WT" on the left. This will maintain consistency with the rest of the manuscript, where "WT" conditions precede "KI" conditions, as observed in the bar graphs and dot plots.

      All heatmaps have been remade with WT on the left and KI on the right to maintain consistency throughout the manuscript. 

      Authors mentioned in several instances (Discussion Pg 19 Line 2, for instance) the analysis of changes in the "Pol II transcriptome." Is this a typographical error?

      The reference to the Pol II transcriptome is not a typographical error (Discussion Pg 19 Line2). Here and elsewhere in the manuscript, we are distinguishing between changes to the Pol III transcriptome and the timing of subsequent changes to the Pol II transcriptome. The text has been edited to clarify this relationship in several places.   

      (1) Introduction, Page 4, last paragraph.

      Analysis of the Pol III transcriptome reveals a common decrease in pre-tRNA and mature tRNA populations and few if any changes among other Pol III transcripts across multiple tissues. Analysis of the Pol II transcriptome reveals activation of the integrated stress response in cerebra but not in other surveyed tissues.

      (2) Results, page 8, 2nd paragraph

      To investigate the molecular changes to Pol III transcript levels caused by the Polr3a mutation and any secondary effects on the Pol II transcriptome, we initially focused on the cerebra of adult mice at P75.

      (3) Discussion, Page 19, second paragraph

      Pol III dysfunction and the reduction in the cerebral tRNA population at P42 coincides with behavioral deficits and precedes substantial downstream alterations in the Pol II transcriptome, which include induction of an innate immune response (IR) and an ISR, and indicators of neurodegeneration (i.e., activation of cell death pathways and loss of mitochondrial DNA). These findings suggest a causal role for the lower tRNA abundance and/or altered tRNA profile in disease progression.

      In supplementary figure 1, authors validated the expression of their systems using flow cytometry and observed a high level of recombination frequency in different tissue types. Can the flow cytometry data distinguish between cell types within the cerebrum (neurons/microglia/astrocytes)?

      The flow cytometry experiments reported in Supplementary Fig. S1 used a dual tdTomato-EGFP reporter to assess recombination. The cerebral and cerebellar samples were gated on fluorescence from endogenous expression of tdTomato (red), EGFP (green) and DAPI (blue) staining. In principle, flow cytometry could be used to distinguish between cell types within the cerebrum (neurons/microglia/astrocytes). However,  this would require (i) an antibody to a cell surface marker on the cell type of interest and (ii) a fluorescent probe conjugated to the primary antibody or a fluorescent secondary antibody that is spectrally well resolved from the emission spectra of tdTomato, eGFP and DAPI.

      Results section 1: Is there any particular reason why P28 was chosen as the commencement of tamoxifen injection?

      P28 was chosen so that any effect of the Polr3a mutation on development and differentiation would be limited in the tissues we examined. 

      Fig 1C: The number of asterisks does not match between the graph and the figure legend.

      Fig. 1C has been corrected to match the number of asterisks in the graph and figure legend.

      Results section 3:

      This section seemed a little brief, especially when compared to the depth of the succeeding sections. Authors can state in greater detail which behaviors were quantified. In S3A-C, my understanding is that the animals were placed in an open-field test. This procedure can be briefly mentioned in the methods, as well as in the main manuscript text.

      In the legends of S3, a bracket is missing for "(D-F)" on line 5. Additionally, the alignment of legends for each bar graph could be consistent for all graphs except under the condition of spatial constraint.

      Detailed methods pertaining to the measurement and calculation of home cage-like behaviors reported by the behavioral spectrometer have been added to the Methods section on Mouse Behavior. 

      In the Results, Figs. S3A-C show anxiety-like behaviors which measure the number and duration of visits and the distance traveled  in a 15 cm2  central area of the arena. Figs. 2A-C show locomotor behaviors including Tracklength, Run sum and Still sum. The open field-like behavior is reported as total Tracklength in the behavioral spectrometer, i.e. the total distance travelled in the arena. This is now more clearly described in  the main manuscript and the Methods section. “overall locomotor activity was decreased in Polr3a-tamKI mice as indicated by the reduced track length at P42, P49, P56 and P63 (Fig. 2A).” 

      The legend of S3, now has the missing bracket "(D-F)" on line 5. 

      The legends within each bar graph are now consistent and aligned as much as spatial constraints allow.

      Results section 4:

      Similar to our earlier questions for S1, is it possible to distinguish samples derived from different cell types (neurons/glia)? In figure 4, this is mainly done post-hoc, based on the known gene expression. Maybe the authors could discuss this small limitation? In Fig S4C, the color contrast for the heatmap legend needs to be corrected.

      It is not possible to accurately distinguish different neural cell sub-types, such as different types of neurons, or different types of oligodendrocytes in bulk RNAseq. Hence, we have reported only high confidence correlations based on known gene expression signatures (Fig. 4). We discuss only the data for which we can draw confident conclusions. The heatmap and legend in Fig. S4C has been amended. 

      Results section 5:

      In figure S5A, the alignment of asterisk significance markers could be adjusted.

      Asterisks have been realigned in Fig. S5A

      Reviewer #2 (Recommendations For The Authors):

      Methods Section should include detailed procedure.

      A detailed description of the methods pertaining to the measurement and calculation of behaviors using the behavioral spectrometer has been added to the Methods section.

      Statistical tests should have detailed information

      Statistical tests are detailed in the Methods section “Statistical Analysis”. Additional details pertaining to calculations of behavioral data have been added to the “Mouse behavior” section of the Methods.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Weaknesses:

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The data utilized in our analysis were collected during the first examination or test conducted after the patients' admission. We specifically excluded any patients with a history of epilepsy, ensuring that all cases of epilepsy identified in our study occurred after admission. Therefore, the features we analyzed were collected after the patients' admission but prior to the onset of post-stroke epilepsy.

      Reviewer #3 (Public review):

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      Thank you for your helpful advice.  Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity.   We revised our code and did a 5 fold cross-validation version ,it didn’t have much promote(because our model has reach the auc of 0.99).Considering that we have sufficient quantity of more than 20000 records, we think split the dataset by 7:3 and train the model is enough for us. We have uploaded the code of 5 fold cross-validation version and ploted the 5 fold test roc  on GitHub at https://github.com/conanan/lasso-ml/lasso_ml_cross_validation.ipynb as an external resource. We  trained the 5 fold average model and ploted the 5 fold test roc curves, the results show some improvement, but it is not substantial because the best model are still tree models in the end.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

      Thank you for your valuable feedback regarding the external validation results. We appreciate your concerns about potential bias and overoptimism in our estimations of positive predictive value (PPV) and sensitivity.

      To clarify, we have uploaded the code for external validation on GitHub at https://github.com/conanan/lasso-ml. The results indicate that the PPV is 0.95 and the specificity is 0.98.

      While we focused on collecting more positive cases due to their lower occurrence rate, this approach allows us to better evaluate the model's ability to predict positive samples, which is crucial in clinical settings. We believe that emphasizing positive cases enhances the model's utility for practical applications(So a little overoptimism is acceptable ).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses 1:

      The methodology needs further consideration. The Discussion needs extensive rewriting.

      Thanks for your advice, we have revised the Discussion

      Reviewer #2 (Public Review):

      Weaknesses 2:

      There are many typos and unclear statements throughout the paper.

      There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

      Thank you for your suggestion that the SHAP analysis is really just a means of interpreting the model.  In our research, we compared the SHAP analysis with traditional statistical methods, such as regression analysis.  We found the SHAP results to be consistent with the statistical results from the regression for variables like white blood cell count (see Table 1). This alignment leads us to believe the SHAP analysis is providing reliable insights in this context

      The Data Collection section is very poorly written, and the methodology is not clear.

      Thanks for your advice, we have revised the Data Collection section.

      There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

      Thank you for the advices of performing hyperparameter. We used the package of sklearn, xgboost, lightgbm of python 3.10 to construct the model and  didn’t change the default settings before. It is not proper and may lead to  less certain conclusions. Now we carry out grid search to select and optimize hyperparameters and they make the model better. The best model is still RF.

      The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

      The procedure of selection is in figure1. Total there are 42079 records from the stroke database, 24733 patients were diagnosed as ischemic stroke or lacular stoke with new onset. Then we excluded hemorrage stroke(4565),history of stroke(2154), TIA(3570), unclear cause stroke(561) and records who missed important data(6496). Then we excluded patients whose seizure might be attributed to other potential causes (brain tumor, intracranial vascular malformation, traumatic brain injury,etc)(865). Then we exclude patient who had a seizure history(152) or died in hospital (1444). Then we excluded patients who were lost in follow-up (had no outpatient records and can’t contact by phone )or died within 3 months of the stroke incident(813). Finally 21459 cases are involved in this research.

      There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(samplingstrategy='auto', randomstate=42)

      the SMOTEENN class comes from the imblearn library. The samplingstrategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The randomstate=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      Did the authors achieve their aims? Do the results support their conclusions?

      Yes, we have achieve some of the aims of predicting PSE while still leave some problem.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

      The data used in our analysis is from the first examination or test conducted after the patients' admission, retrieved from a PostgreSQL database. First, we extracted the initial admission date for patients admitted due to stroke. Then, we identified the nearest subsequent examination data for each of those patients.

      The sql code like follows:

      SELECT TO_DATE(condition_start_date, 'DD-MM-YYYY') AS DATE

      FROM diagnosis

      WHERE person_id ={} and (condition_name like '%梗死%' or condition_name like '%梗塞%') and(condition_name like '%脑%'or condition_name like '%腔隙%'))

      order by DATE limit 1

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice. The external validation is certainly very important, but there have been some difficulties in reaching a perfect solution.  We have tried using open-source databases like the MIMIC database, but the data there does not fit our needs as closely as the records from our own hospital.  The MIMIC database lacks some of the key features we require, and also lacks the detailed patient follow-up information that is crucial for our analysis.   Given these limitations, we have decided to collect newer records from the same hospitals here in Chongqing.  We believe this will allow us to build a more comprehensive dataset to support robust external validation.  While it may not be a perfect solution, gathering this additional data from our local healthcare system is a pragmatic step forward.   Looking ahead, we plan to continue expanding this Chongqing-based dataset and report on the results of the greater external validation in the future.  We are committed to overcoming the challenges around data availability to strengthen the validity and generalizability of our research findings.

      For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote(because our model has reach the auc of 0.99), we may use this great technique in our next study if there is not enough cases.

      Additional context that might help readers

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      Thank you for your helpful advice. It is a great improve for our draft, we have added the explanation that we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute opposite to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Public Review):

      Weaknesses3:

      There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

      Thanks for your advice, we have revised these flaws.

      The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

      Thank you for your recommendations. We have made the code available on GitHub at https://github.com/conanan/lasso-ml. While the data is private and belongs to the hospital. Access can be requested by contacting the corresponding author to apply from the hospitals and specifying the purpose of inquiry.

      Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.

      Thank you for your valuable advice. Performing n-fold cross-validation is crucial for ensuring the reliability and robustness of results, especially with limited datasets. However, since we have over 20,000 records, we believe that a 70:30 split for training and testing is sufficient.

      We revised our code and implemented 5-fold cross-validation, which provided minimal improvement, as our model has already achieved an AUC of 0.99. We plan to use this technique in future studies if we encounter fewer cases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My comments include two parts:

      (1) Methodology<br /> a-This study was based on multiple clinical indicators to construct a model for predicting the occurrence of PSE. It involved various multi-class indicators such as the affected cortical regions, locations of vascular occlusion, NIHSS scores, etc. Only using the SHAP index to explain the impact of multi-class variables on the dependent variable seems slightly insufficient. It might be worth considering the use of dummy variables to improve the model's accuracy.

      Thank you for the detailed feedback on the study methodology. The SHAP analysis is really just a means of interpreting the model, which we compared with the combination of SHAP and traditional statistics, so we think SHAP analysis is reliable in this research. We have used the dummy variables, expecially when dealing with the affected cortical regions, locations of vascular occlusion, for example if frontal region is involved the variable is 1. But they have less impact in the machine learning model

      b-The study used Lasso regression to select 20 features to build the model. How was the optimal number of 20 features determined?

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      c-The study indicated that the incidence rate of PSE in the enrolled patients is 4.3%, showing a highly imbalanced dataset. If singly using the SMOTE method for oversampling, could this lead to overfitting?

      Thanks for your remind, singly using the SMOTE method for oversampling is inproper. Now we have find this improvement and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. First, oversampling with SMOTE and then undersampling with ENN to remove possible noise and duplicate samples. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      (2) Clinical aspects:

      Line 8, history of ischemic stroke, this is misexpression, could be: diagnosis of ischemic stroke.

      Line 8, several hospitals, should be more exact; how many?

      Line 74 indicates that the data are from a single centre, this should be clarified.

      Line 4 data collection: The criteria read unclear; please clarify further.

      Thanks for your remind, we have revised the draft and correct these errors.

      Line 110, lab parameters: Why is there no blood glucose?

      Because many patients' blood sugar fluctuates greatly and is easily affected by drugs or diet, we finally consider HBA1c as a reference index by asking experts which is more stable.

      Line 295, The author indicated that data lost; this should be clarified in the results part, and further, the treatment of missing data should be clarified in the method part.

      Thanks for your remind, we have revised the draft and correct these errors.

      I hope to see a table of the cohort's baseline characters. The discussion needs extensive rewriting; the author seems to be swinging from the stoke outcome and the seizure, sometimes losing the target.

      Figure1 is the procedure of the selection of patients. Table1 contains the cohort's baseline characters

      For the swinging from the stoke outcome and the seizure, that is because there are few articles on predicting epilepsy directly by relevant indicators, while there are more articles on prognosis. So we can only take epilepsy as an important factor in prognosis and comprehensively discuss it, or we can't find enough articles and discuss them

      Reviewer #2 (Recommendations For The Authors):

      There are typos and examples of text that are not clear, including:

      "About the nihss score, the higher the nihss score, the more likely to be PSE, nihss score has a third effect just below white blood cell count and D-dimer."

      "and only 8 people made incorrect predictions, demonstratijmng a good predictive ability of the model."

      "female were prone to PSE"

      " Waafi's research"

      "One-heat' (should be one-hot)

      Thanks for your remind, we have revised the draft and correct these errors.

      The Data Collection section is poorly written, and the methodology is not clear. It would be much more appropriate to include a table of all features used and an explanation of what these features involve. It would also be useful to see the mean values of these features to assess whether the feature values are reasonable for the dataset.

      Thanks for your remind. All data are from the first examination or test after admission, presented through the postgresql database . First we extract the first date of the patients who was admitted by stroke ,then we extract informations from the nearest examination from the admission. We extract by the SQL code by computer instead of others who may extract data by manual so we get as much data as possible other than only get the features which was reported before .The table of all features used and their mean±std is in table1.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage. I would need this clarified before believing the authors achieved their claims of building a predictive model.

      All relevant index results were from the first examination after admission, and the mean standard deviation was listed in the statistical analysis section in table1.

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice, the external validation is very important but there are some difficulties to reach a perfect one. We have tried some of the open source database like the mimic database ,but these data don't fit our request because they don't have as much features as our hospital and lack of follow-up of the relevant patients. In the end we collected the newer records in the same hospitals in Chongqing and we will collect more and report a greater external validation in the future.

      For greater certainty on all reported results, It would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits.

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote, we will use this great technique in our next study.

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      It is a great improve for our draft, we have added the explanation we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute lower to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Recommendations For The Authors):

      Abbreviations should not be defined in the abstract )or only in the abstract).

      Please explicit what are the purposes of the study you are referring to in "Currently, most studies utilize clinical data to establish statistical models, survival analysis and cox regression."

      Authors affirm: "there is still a relative scarcity of research 49 on PSE prediction, with most studies focusing on the analysis of specific or certain risk factors ." This statement is especially curious since the current study uses risk factors as predictors.

      It is not clear to me what the authors mean by "No study has proposed or established a more comprehensive and scientifically accurate prediction model." The authors do not summarize the statistical parameters of previously reported model, or other relevant data to assess coverage or validity (maybe including a Table summarizing such information would be appropriate. In any case, I would try to omit statements that imply, to some extent, discrediting previous studies without sufficient foundation.

      "antiepileptic drugs" is an outdated name. Please use "antiseizure medications"

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say regarding missing data that they "filled the data of the remaining indicators with missing values of more than 1000 cases by random forest algorithm". Please clarify what you mean by "of more than 1000 cases." Also, provide details on the RF model used to fill in missing data.

      Thanks for your remind. "of more than 1000 cases" was a wrong sentence and we have corrected it. Here is the procedure, first we counted the values of all laboratory indicators for the first time after stroke admission( everyone who was admitted because of stroke would perform blood routine , liver and kidney function and so on), excluded indicators with missing values of more than 10%, and filled the data of the remaining indicators with missing values by random forest algorithm using the default parameter. First, we go through all the features, starting with the one with the least missing (since the least accurate information is needed to fill in the feature with the least missing). When filling in a feature, replace the missing value of the other feature with 0. Each time a regression prediction is completed, the predicted value is placed in the original feature matrix and the next feature is filled in. After going through all the features, the data filling is complete.

      Please specify what do you mean by negative group and positive group, Avoid tacit assumptions.

      Thanks for your remind, we have revised the draft and correct these errors.

      Please provide more details (and references) on the smote oversampling method. Indicate any relevant parameters/hyperparameters.

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      The methodology is presented in an extremely succinct and non-organic manner (e.g., (Model building) Select the 20 features with the largest absolute value of LASSO." Please try to improve the narrative.

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      Many passages of the text need references. For example, those that refer to Levene test, Welch's t-test, Brier score, Youden index, and many others (e.g., NIHSS score). Please revise carefully.

      Thanks for your remind, we have revised the draft and correct these errors.

      "Statistical details of the clinical characteristics of the patients are provided in the table." Which table? Number?

      Thanks for your remind, we have revised the draft and correct these errors, it is in table1.

      Many abbreviations are not properly presented and defined in the text, e.g., wbc count, hba1c, crp, tg, ast, alt, bilirubin, bua, aptt, tt, d_dimer, ck. Whereas I can guess the meaning, do not assume everyone will. Avoid assumptions.

      ROC is sometimes written "ROC" and others, "roc." The same happens for PPV/ppv, and many other words (SMOTE; NIHSS score, etc.).

      Please rephrase "ppv value of random forest is the highest, reaching 0.977, which is more accurate for the identification of positive patients(the most important function of our models).". PPV always refer to positive predictions that are corroborated, so the sentences seem redundant.

      Thanks for your remind, we have revised the draft and correct these errors.

      What do you mean by "Complex algorithms". Please try to be as explicit as possible. The text looks rather cryptic or vague in many passages.

      Thanks for your remind, "Complex algorithms" is corrected by machine learning.

      The text needs a thorough English language-focused revision, since the sense of some sentences is really misleading. For instance "only 8 people made incorrect predictions,". I guess the authors try to say that the best algorithm only mispredicted 8 cases since no people are making predictions here. Also, regarding that quote... Are the authors still speaking of the results of the random forest model, which was said to be one of the best performances?

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say that they used, as predictors "comprehensive clinical data, imaging data, laboratory test data, and other data from stroke patients". However, the total pool of predictors is not clear to me at this point. Please make it explicit and avoid abbreviations.

      Thanks for your remind, we have revised the draft and correct these errors.

      Although the authors say that their code is available upon request, I think it would be better to have it published in an appropriate repository.

      Thanks for your remind, we showed our code at  https://github.com/conanan/lasso-ml.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigated how the presence of interspecific introgressions in the genome affects the recombination landscape. This research was intended to inform about genetic phenomena influencing the evolution of introgressed regions, although it should be noted that the research itself is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. In this work, yeast hybrids with large (from several to several dozen percent of the chromosome length) introgressions from another yeast species were crossed. Then, the products of meiosis were isolated and sequenced, and on this basis, the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs) was examined. Carrying out the analysis at different levels of resolution, it was found that in the regions of introduction, there is a very significant reduction in the frequency of COs and a simultaneous increase in the frequency of NCOs. Moreover, it was confirmed that introgressions significantly limit the local shuffling of genetic information, and NCOs are only able to slightly contribute to the shuffling, thus they do not compensate for the loss of CO recombination.

      Strengths:

      - Previously, experiments examining the impact of SNP polymorphism on meiotic recombination were conducted either on the scale of single hotspots or the entire hybrid genome, but the impact of large introgressed regions from another species was not examined. Therefore, the strength of this work is its interesting research setup, which allows for providing data from a different perspective.

      - Good quality genome-wide data on the distribution of CO and NCO were obtained, which could be related to local changes in the level of polymorphism.

      Weaknesses:

      (1)  The research is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. Moreover, meiosis is stimulated in hybrids in which introgressions occur in a heterozygous state, which is a very unlikely situation in nature. Therefore, I see the main value of the work in providing information on the CO/NCO decision in regions with high sequence diversification, but not in the context of evolution.

      While we are indeed only examining recombination in a single generation, we respectfully disagree that our results aren't relevant to evolutionary processes. The broad goals of our study are to compare recombination landscapes between closely related strains, and we highlight dramatic differences between recombination landscapes. These results add to a body of literature that seeks to understand the existence of variation in traits like recombination rate, and how recombination rate can evolve between populations and species. We show here that the presence of introgression can contribute to changes in recombination rate measured in different individuals or populations, which has not been previously appreciated. We furthermore show that introgression can reduce shuffling between alleles on a chromosome, which is recognized as one of the most important determinants for the existence and persistence of sexual reproduction across all organisms. As we describe in our introduction and conclusion, we see our experimental exploration of the impacts of introgression on the recombination landscape as complementary to studies inferring recombination and introgression from population sequencing data and simulations. There are benefits and challenges to each approach, but both can help us better understand these processes. In regards to the utility of exploring heterozygous introgression, we point out that introgression is often found in a heterozygous state (including in modern humans with Neanderthal and/or Denisovan ancestry). Introgression will always be heterozygous immediately after hybridization, and depending on the frequency of gene flow into the population, the level of inbreeding, selection against introgression, etc., introgression will typically be found as heterozygous.

      - The work requires greater care in preparing informative figures and, more importantly, re-analysis of some of the data (see comments below).

      More specific comments:

      (1) The authors themselves admit that the detection of NCO, due to the short size of conversion tracts, depends on the density of SNPs in a given region. Consequently, more NCOs will be detected in introgressed regions with a high density of polymorphisms compared to the rest of the genome. To investigate what impact this has on the analysis, the authors should demonstrate that the efficiency of detecting NCOs in introgressed regions is not significantly higher than the efficiency of detecting NCOs in the rest of the genome. If it turns out that this impact is significant, analyses should be presented proving that it does not entirely explain the increase in the frequency of NCOs in introgressed regions.

      We conducted a deeper exploration of the effect of marker resolution on NCO detection by randomly removing different proportions of markers from introgressed regions of the fermentation cross in order to simulate different marker resolutions from non-introgressed regions. We chose proportions of markers that would simulate different quantiles of the resolution of non-introgressed regions and repeated our standard pipeline in order to compare our NCO detection at the chosen marker densities. More details of this analysis have been added to the manuscript (lines 188-199, 525-538). We confirmed the effect of marker resolution on NCO detection (as reported in the updated manuscript and new supplementary figures S2-S10, new Table S10) and decided to repeat our analyses on the original data with a more stringent correction. For this we chose our observed average tract size for NCOs in introgressed regions (550bp), which leads to a far more conservative estimate of NCO counts (As seen in the updated Figure 2 and Table 2). This better accounts for the increased resolution in introgressed regions, and while it's possible to be more stringent with our corrections, we believe that further stringency would be unreasonable. We also see promising signs that the correction is sufficient when counting our CO and NCO events in both crosses, as described in our response to comment 39 (response to reviewer #3).

      (2) CO and NCO analyses performed separately for individual regions rarely show statistical significance (Figures 3 and 4). I think that the authors, after dividing the introgressed regions into non-overlapping windows of 100 bp (I suggest also trying 200 bp, 500 bp, and 1kb windows), should combine the data for all regions and perform correlations to SNP density in each window for the whole set of data. Such an analysis has a greater chance of demonstrating statistically significant relationships. This could replace the analysis presented in Figure 3 (which can be moved to Supplement). Moreover, the analysis should also take into account indels.

      We're uncertain of what is being requested here. If the comment refers to the effect of marker density on NCO detection, we hope the response to comment 2 will help resolve this comment as well. Otherwise, we ask for some clarification so that we may correct or revise as appropriate.

      (3) In Arabidopsis, it has been shown that crossover is stimulated in heterozygous regions that are adjacent to homozygous regions on the same chromosome (http://dx.doi.org/10.7554/eLife.03708.001, https://doi.org/10.1038/s41467-022-35722-3).

      This effect applies only to class I crossovers, and is reversed for class II crossovers (https://doi.org/10.15252/embj.2020104858, https://doi.org/10.1038/s41467-023-42511-z). This research system is very similar to the system used by the authors, although it likely differs in the level of DNA sequence divergence. The authors could discuss their work in this context.

      We thank the reviewer for sharing these references. We have added a discussion of our work in the context of these findings in the Discussion, lines 367-376.

      Reviewer #2 (Public Review):

      Summary:

      Schwartzkopf et al characterized the meiotic recombination impact of highly heterozygous introgressed regions within the budding yeast Saccharomyces uvarum, a close relative of the canonical model Saccharomyces cerevisiae. To do so, they took advantage of the naturally occurring Saccharomyces bayanus introgressions specifically within fermentation isolates of S. uvarum and compared their behavior to the syntenic regions of a cross between natural isolates that do not contain such introgressions. Analysis of crossover (CO) and noncrossover (NCO) recombination events shows both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency. These results strongly support the hypothesis that DNA sequence polymorphism inhibits CO formation, and has no or much weaker effects on NCO formation. Eventually, the authors show that the presence of introgressions negatively impacts "r", the parameter that reflects the probability that a randomly chosen pair of loci shuffles their alleles in a gamete.

      The authors chose a sound experimental setup that allowed them to directly compare recombination properties of orthologous syntenic regions in an otherwise intra-specific genetic background. The way the analyses have been performed looks right, although this reviewer is unable to judge the relevance of the statistical tests used. Eventually, most of their results which are elegant and of interest to the community are present in Figure 2.

      Strengths:

      Analysis of crossover (CO) and noncrossover (NCO) recombination events is compelling in showing both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency.

      Weaknesses:

      The main weaknesses refer to a few text issues and a lack of discussion about the mechanistic implications of the present findings.

      - Introduction

      (1) The introduction is rather long. | I suggest specifically referring to "meiotic" recombination (line 71) and to "meiotic" DSBs (line 73) since recombination can occur outside of meiosis (ie somatic cells).

      We agree and have condensed the introduction to be more focused. We also made the suggested edits to include “meiotic” when referring to recombination and DSBs.

      (2) From lines 79 to 87: the description of recombination is unnecessarily complex and confusing. I suggest the authors simply remind that DSB repair through homologous recombination is inherently associated with a gene conversion tract (primarily as a result of the repair of heteroduplex DNA by the mismatch repair (MMR) machinery) that can be associated or not to a crossover. The former recombination product is a crossover (CO), the latter product is a noncrossover (NCO) or gene conversion. Limited markers may prevent the detection of gene conversions, which erase NCO but do not affect CO detection.

      We changed the language in this section to reflect the reviewer’s suggestions.

      (3) In addition, "resolution" in the recombination field refers to the processing of a double Holliday junction containing intermediates by structure-specific nucleases. To avoid any confusion, I suggest avoiding using "resolution" and simply sticking with "DSB repair" all along the text.

      We made the suggested correction throughout the paper.

      (4) Note that there are several studies about S. cerevisiae meiotic recombination landscapes using different hybrids that show different CO counts. In the introduction, the authors refer to Mancera et al 2008, a reference paper in the field. In this paper, the hybrid used showed ca. 90 CO per meiosis, while their reference to Liu et al 2018 in Figure 2 shows less than 80 COs per meiosis for S. cerevisiae. This shows that it is not easy to come up with a definitive CO count per meiosis in a given species. This needs to be taken into account for the result section line 315-321.

      This is an excellent point. We added this context in the results (lines 180-187).

      (5) In line 104, the authors refer to S. paradoxus and mention that its recombination rate is significantly different from that of S. cerevisiae. This is inaccurate since this paper claims that the CO landscape is even more conserved than the DSB landscape between these two species, and they even identify a strong role played by the subtelomeric regions. So, the discussion about this paper cannot stand as it is.

      We agree with the reviewer's point. We also found that the entire paragraph was unnecessary, so it and the sentence in question have been removed.

      (6) Line 150, when the authors refer to the anti-recombinogenic activity of the MMR, I suggest referring to the published work from Martini et al 2011 rather than the not-yet-published work from Copper et al 2021, or both, if needed.

      Added the suggested citation.

      Results

      (7) The clear depletion in CO and the concomitant increase in NCO within the introgressed regions strongly suggest that DNA sequence polymorphism triggers CO inhibition but does not affect NCO or to a much lower extent. Because most CO likely arises from the ZMM pathway (CO interference pathway mainly relying on Zip1, 2, 3, 4, Spo16, Msh4, 5, and Mer3) in S. uvarum as in S. cerevisiae, and because the effect of sequence polymorphism is likely mediated by the MMR machinery, this would imply that MMR specifically inhibits the ZMM pathway at some point in S. uvarum. The weak effect or potential absence of the effect of sequence polymorphism on NCO formation suggests that heteroduplex DNA tracts, at least the way they form during NCO formation, escape the anti-recombinogenic effect of MMR in S. uvarum. A few comments about this could be added.

      We have added discussion and citations regarding the biased repair of DSB to NCO in introgression, lines 380-386.

      (8) The same applies to the fact that the CO number is lower in the natural cross compared to the fermentation cross, while the NCO number is the same. This suggests that under similar initiating Spo11-DSB numbers in both crosses, the decrease in CO is likely compensated by a similar increase in inter-sister recombination.

      Thank you to the reviewer for this observation. We agree that this could explain some differences between the crosses.

      (9) Introgressions represent only 10% of the genome, while the decrease in CO is at least 20%. This is a bit surprising especially in light of CO regulation mechanisms such as CO homeostasis that tends to keep CO constant. Could the authors comment on that?

      We interpret these results to reflect two underlying mechanisms. First, the presence of heterozygous introgression does reduce the number of COs. Second, we believe the difference in COs reflects variation in recombination rate between strains. We note that CO homeostasis need not apply across different genetic backgrounds. Indeed, recombination rate is appreciated to significantly differ between strains of S. cerevisiae (Raffoux et al. 2018), and recombination rate variation has been observed between strains/lines/populations in many different species including Drosophila, mice, humans, Arabidopsis, maize, etc. We reference S. cerevisiae strain variability in the Introduction lines 128-130, and have added context in the Results lines 180-187, and Discussion lines 343-350.

      (10) Finally, the frequency of NCOs in introgressed regions is about twice the frequency of CO in non-introgressed regions. Both CO and NCO result from Spo11-initiating DSBs.

      This suggests that more Spo11-DSBs are formed within introgressed regions and that such DSBs specifically give rise to NCO. Could this be related to the lack of homolog engagement which in turn shuts down Spo11-DSB formation as observed in ZMM mutants by the Keeney lab? Could this simply result from better detection of NCO in introgressed regions related to the increased marker density, although the authors claim that NCO counts are corrected for marker resolution?

      The effect noted by the reviewer remains despite the more conservative correction for marker density applied to NCO counts (as described in the response to Reviewer 1, comment #2). Given that CO+NCO counts in introgressed regions are not statistically different between crosses, it is likely that these regions are simply predisposed to a higher rate of DSBs than the rest of the genome. This is an interesting observation, however, and one that we would like to further explore in future work.

      (11) What could be the explanation for chromosome 12 to have more shuffling in the natural cross compared to the fermentation cross which is deprived of the introgressed region?

      We added this text to the Results, lines 323-327, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      Technical points:

      (12) In line 248, the authors removed NCO with fewer than three associated markers.

      What is the rationale for this? Is the genotyping strategy not reliable enough to consider events with only one or two markers? NCO events can be rather small and even escape detection due to low local marker density.

      We trust the genotyping strategy we used, but chose to be conservative in our detection of NCOs to account for potential sequencing biases.

      (13) Line 270: The way homology is calculated looks odd to this reviewer, especially the meaning of 0.5 homology. A site is either identical (1 homology) or not (0 homology).

      We've changed the language to better reflect what we are calculating (diploid sequence similarity; see comment #28). Essentially, the metric is a probability that two randomly selected chromatids--one from each parent--will share the same nucleotide at a given locus (akin to calculating the probability of homozygous offspring at a single locus). We average it along a segment of the genome to establish an expected sequence similarity if/when recombination occurs in that segment.

      (14) Line 365: beware that the estimates are for mitotic mismatch repair (MMR). Meiotic MMR may work differently.

      We removed the citation that refers exclusively to mitotic recombination. The statement regarding meiotic recombination is otherwise still reflective of results from Chen & Jinks-Robertson

      (15) Figure 1: there is no mention of potential 4:0 segregations. Did the authors find no such pattern? If not, how did they consider them?

      The program we used to call COs and NCOs (ReCombine's CrossOver program) can detect such patterns, but none were detected in our data.

      Reviewer #3 (Public Review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that, when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during meiosis.

      (1) While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors used a relatively simple correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, counter to expectations noncrossovers are substantially increased. They then show that, despite this, genetic shuffling overall is substantially reduced in regions of heterozygous introgression. However, it is likely that the correction used to compensate for the effect of increased sequence density is defective, and has not fully compensated for the ascertainment bias due to greater marker density. The simplest indication of this potential artifact is that, when crossover frequencies and "corrected" noncrossover frequencies are taken together, regions of introgression often appear to have greater levels of total recombination than flanking regions with much lower levels of heterology. This concern seriously undercuts virtually all of the novel conclusions of the study. Until this methodological concern is addressed, the work will not be a useful contribution to the field.

      We appreciate this concern. Please see response to comments #2 and #38. We further note that our results depicted in Figure 3 and 4 are not reliant on any correction or comparison with non-introgressed regions, and thus our results regarding sequence similarity and its effect on the repair of DSBs and the amount of genetic shuffling with/without introgression to be novel and important observations for the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 149 - this sentence refers to a mixture of papers reporting somatic or meiotic recombination and as these processes are based on different crossover pathways, this should not be mixed. For example, it is known that in Arabidopsis MSH2 has a pro-crossover function during meiotic recombination.

      Corrected

      (2) What is unclear to me is how the crosses are planned. Line 308 shows that there were only two crosses (one "natural" and one "fermentation"), but I understand that this is a shorthand and in fact several (four?) different strains were used for the "fermentation cross". At least that's what I concluded from Fig. 1B and its figure caption. This needs to be further explained. Were different strains used for each fermentation cross, or was one strain repeated in several crosses? In Figure 1, it would be worth showing, next to the panel showing "fermentation cross", a diagram of how "natural cross" was performed, because as I understand it, panel A illustrates the procedure common to both types of crosses, and not for "natural cross".

      We thank the reviewer for drawing our attention to confusion about how our crosses were created. We performed two crosses, as depicted in Figure 1A. The fermentation cross is a single cross from two strains isolated from fermentation environments. The natural cross is a single cross from two strains isolated from a tree and insect. Table S1 and the methods section "Strain and library construction" describe the strains used in more detail. We modified Figure 1 and the figure legend to help clarify this. See also response to comment #37.

      (3) The authors should provide a more detailed characterization of the genetic differences between chromosomes in their hybrids. What is the level of polymorphism along the S. uvarum chromosomes used in the experiments? Is this polymorphism evenly distributed? What are the differences in the level of polymorphism for individual introgressions? Theoretically, this data should be visible in Figure 2D, but this figure is practically illegible in the present form (see next comment).

      As suggested, we remade Figure 2D to only include chromosomes with an introgression present, and moved the remaining chromosomes to the supplements (Figure S11). The patterns of markers (which are fixed differences between the strains in the focal cross) should be more clear now. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression).

      (4) Figure 2D should be prepared more clearly, I would suggest stretching the chromosomes, otherwise, it is difficult to see what is happening in the introgression regions for CO and NCO (data for SNPs are more readable). Maybe leave only the chromosomes with introgressions and transfer the rest to the supplement?

      See previous comment.

      (5) How are the Y scales defined for Figure 2D?

      Figure 2D now includes units for the y-axis.

      (6) Are increases in CO levels in fermentation cross-observed at the border with introgressions? This would indicate local compensation for recombination loss in the introgressed regions, similar to that often observed for chromosomal inversions.

      We see no evidence of an increase in CO levels at the borders of introgressions, neither through visual inspection or by comparing the average CO rate in all fermentation windows to that of windows at the edges of introgressions. This is included in the Discussion lines 360-366, "While we are limited in our interpretations by only comparing two crosses (one cross with heterozygous introgression and one without introgression), these results are in line with findings in inversions, where heterozygotes show sharp decreases in COs, but the presence of NCOs in the inverted region (Crown et al., 2018; Korunes & Noor, 2019). However, unlike heterozygous inversions where an increase in COs is observed on freely recombining chromosomes (the inter-chromosomal effect), we do not see an increase in COs on the borders flanking introgression or on chromosomes without introgression."

      (7) Line 336 - "We find positive correlations between CO counts..." - you should indicate here that between fermentation and natural crosses, it was quite hard for me to understand what you calculated.

      We corrected the language as suggested.

      (8) The term "homology" usually means "having a common evolutionary origin" and does not specify the level of similarity between sequences, thus it cannot be measured. It is used incorrectly throughout the manuscript (also in the intro). I would use the term "similarity" to indicate the degree of similarity between two sequences.

      We corrected the language as suggested throughout the document.

      (9) Paragraph 360 and Figure 3 - was the "sliding window" overlapping or non-overlapping?

      We added clarifying language to the text in both places. We use a 101bp sliding window with 50bp overlaps.

      (10) Line 369 - what is "...the proportion of bases that are expected to match between the two parent strains..."?

      We clarified the language in this location, and hopefully changes associated with the comment about sequence similarity will make the comment even clearer in context.

      (11) Line 378 - should it refer to Figure S1 and not Figure 4?

      Corrected.

      (12) Line 399 - should refer to Figure 4, not Figure 5.

      Corrected

      (13) Line 444-449 - the analysis of loss of shuffling in the context of the location of introgression on the chromosome should be presented in the result section.

      We shifted the core of the analysis to the results, while leaving a brief summary in the discussion.

      (14) The authors should also take into account the presence of indels in their analyses, and they should be marked in the figures, if possible.

      We filtered out indels in our variant calling. However, we did analyze our crosses for the presence of large insertions and deletions (Table S2), which can obscure true recombination rates, and found that they were not an issue in our dataset.

      Reviewer #2 (Recommendations For The Authors):

      This reviewer suggests that the authors address the different points raised in the public review.

      (1) This reviewer would like to challenge the relevance of the r-parameter in light of chromosome 12 which has no introgression and still a strong depletion in r in the fermentation cross.

      We added this text to the Results, lines 377-381, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      (2) This reviewer insists on making sure that NCO detection is unaffected by the marker density, notably in the highly polymorphic regions, to unambiguously support Figure 1C.

      We've changed our correction for resolution to be more aggressive (see response to comment #2), and believe we have now adequately adjusted for marker density (see response to comment #38).

      Reviewer #3 (Recommendations For The Authors):

      I regret using such harsh language in the public review, but in my opinion, there has been a serious error in how marker densities are corrected for, and, since the manuscript is now public, it seems important to make it clear in public that I think that the conclusions of the paper are likely to be incorrect. I regret the distress that the public airing of this may cause. Below are my major concerns:

      (1) The paper is written in a way that makes it difficult to figure out just what the sequence differences are within the crosses. Part of this is, to be frank, the unusual way that the crosses were done, between more than one segregant each from two diploids in both natural and fermentation cases. I gather, from the homology calculations description, that each of these four diploids, while largely homozygous, contained a substantial number of heterozygosities, so individual diploids had different patterns of heterology. Is this correct? And if so, why was this strategy chosen? Why not start with a single diploid where all of the heterologies are known? Why choose to insert this additional complication into the mix? It seems to me that this strategy might have the perverse effect of having the heterology due to the polymorphisms present in one diploid affect (by correction) the impact of a noncrossover that occurs in a diploid that lacks the additional heterology. If polymorphic markers are a small fraction of total markers, then this isn't such a great concern, but I could not find the information anywhere in the manuscript. As a courtesy to the reader, please consider providing at the beginning some basic details about the starting strains-what is the average level of heterology between natural A and natural B, and what fraction of markers are polymorphic; what is the average level of heterology between fermentation A and fermentation B in non-introgressed regions, in introgressed regions, and what fraction of markers are polymorphic? How do these levels of heterology compare to what has been examined before in whole-genome hybrid strains? It also might be worth looking at some of the old literature describing S. cerevisiae/S. carlsbergensis hybrids.

      We thank the reviewer for drawing our attention to confusion about the cross construction. These crosses were conducted as is typical for yeast genetic crosses: we crossed 2 genetically distinct haploid parents to create a heterozygous diploid, then collected the haploid products of meiosis from the same F1 diploid. Because the crosses were made with haploid parents, it is not possible for other genetic differences to be segregating in the crosses. We have revised Figure 1 and its caption to clarify this. Further details regarding the crosses are in the Methods section "Strain and library construction" and in Supplemental Table S1. We only utilized genetic markers that are fixed differences between our parental strains to call CO and NCO. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression). We additionally revised Figure 2D (and Figure S11) to help readers better visualize differences between the crosses.

      (2) There are serious concerns about the methods used to identify noncrossovers and to normalize their levels, which are probably resulting in an artifactually high level of calculated crossovers in Figure 2. As a primary indication of this, it appears in Figure 2 that the total frequency of events (crossovers + noncrossovers) in heterozygous introgressed regions are substantially greater than those in the same region in non-introgressed strains, while just shifting of crossovers to noncrossovers would result in no net increase. The simplest explanation for this is that noncrossovers are being undercounted in non-introgressed relative to introgressed heterozygous regions. There are two possible reasons for this: i. The exclusion of all noncrossover events spanning less than three markers means that many more noncrossovers in introgressed heterozygous regions than in non-introgressed. Assuming that average non-homology is 5% in the former and 1% in the latter, the average 3-marker event will be 60 nt in introgressed regions and 300 nt in non-introgressed regions - so many more noncrossovers will be counted in introgressed regions. A way to check on this - look at the number of crossover-associated markers that undergo gene conversion; use the fraction that involves < 3 markers to adjust noncrossover levels (this is the strategy used by Mancera et al.). ii. The distance used for noncrossover level adjustment (2kb) is considerably greater than the measured average noncrossover lengths in other studies. The effect of using a too-long distance is to differentially under-correct for noncrossovers in non-introgressed regions, while virtually all noncrossovers in heterozygous introgressed regions will be detected. This can be illustrated by simulations that reduce the density of scored markers in heterozygous introgressed regions to the density seen in non-introgressed regions. Because these concerns go to the heart of the conclusions of the paper, they must be addressed quantitatively - if not, the main conclusions of the paper are invalid.

      We adjusted the correction factor (See also response to comment #2) and compared the average number of CO and NCO events in introgressed and non-introgressed regions between crosses (two comparisons: introgression CO+NCO in natural cross vs introgression CO+NCO in fermentation cross; non-introgression CO+NCO in natural cross vs non-introgression CO+NCO in fermentation cross). We found no significant differences between the crosses in either of the comparisons. This indicates that the distribution of total events is replicated in both crosses once we correct for resolution.

      (3) It is important to distinguish the landscape of double-strand breaks from the landscape of recombination frequencies. Double-strand breaks, as measured by uncalibrated levels of Spo11-linked oligos, is a relative number - not an absolute frequency. So it is possible that two species could have a similar break landscape in terms of topography but have absolute levels higher in one species than in the other.

      We agree with this statement, however, we have removed the relevant text to streamline our introduction.

      (4) Lines 123-125. Just meiosis will produce mosaic genomes in the progeny of the F1; further backcrossing will reduce mosaicism to the level of isolated regions of introgression.

      Adjusted the language to be more specific.

      (5) Please provide actual units for the Y axes in Figure 2D.

      We have corrected the units on the axes.

      (6) Tables (general). Are the significance measures corrected for multiple comparisons?

      In Table 3, the cutoff was chosen to be more conservative than a Bonferroni corrected alpha=0.01 with 9 comparisons (0.0011). In text, any result referred to as significant has an associated hypothesis test with a p-value less than its corresponding Bonferroni-corrected alpha of 0.05. This has been clarified in the caption for Table 3 and in the text where relevant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      I have added a paragraph that addresses the issue of how landmarks might be used and why they are not. The suggestions made in the "Weaknesses" paragraph were concise and excellent and have directly incorporated them into my revised manuscript. This text appears on Page 21 and is shown below. I hope that this is what the editors and reviewers were looking.

      The requested revision is the second paragraph.

      The first paragraph was not written in response to reviews but inspired by a recent paper by Mahdev et al (2024) - https://doi.org/10.1038/s41593-024-01681-9.  I had already requested to add this reference and was encouraged to do so by the Editors. The Mahdev et al paper was very surprising in that it showed that path integration is not constant but that its "gain" can be recalibrated by selfmotion signals. I wondered whether this unexpected capacity extended to path integration also recalibrating the cognitive map and thereby generating the shortcutting behavior we observe. I suggested that, at an abstract level, this would correspond to "coordinate transformation" of the cognitive map. I realize that this is entirely speculative. If the Editors feel that it does not add much to the manuscript and that the speculation goes to far, I will remove the first paragraph and re-submit.

      Added text. P21 and just before the heading: " Implications for theories of hippocampal representations of spatial maps" There were no other changes made in the paper.

      "Path integration uses self-motion signals to update the animal's estimated location on its internal cognitive map. Path integration gain has been shown to be plastic and regulated by landmarks (52). Remarkably, a recent study has revealed that path integration gain can also be directly recalibrated by self-motion signals alone (53), albeit not as effectively as by landmarks (52, 53). An interesting question for future research is whether self-motion signals can also recalibrate the coordinates of a cognitive map. From this perspective, the Target B to Target A shortcut requires a transformation of the cognitive map coordinates so that the start point is now Target B.

      Extensive research has shown that external cues can control hippocampal neuron place fields (11, 12, 54) and the gain of the path integrator (52), making the failure of mice in our study to use such cues puzzling. The failure to use landmarks may be related to our task being low stakes and our pretraining procedure teaching the mouse that such cues are not necessary. Our results may not generalize to more natural conditions where many reliable prominent cues are available, and where there is urgency to find food or water while avoiding predation (55). Under these more naturalistic conditions the use of distal cues to rapidly find a food reward is more likely to be observed."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We revised the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.” 

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We revised Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We restructured the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We added the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and removed the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data).

      We agree. We added an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We corrected Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We revised the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we added the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our

      lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) A major issue throughout the paper is that Hox expression analysis is done exclusively through quantitative PCR, with values ranging from 2-fold to several thousand-fold upregulation, with no antibody validation for any Hox protein (presumably they are all upregulated).

      Thank you for your comment.

      We tried to verify the stimulated Hox expression pattern by in situ hybridization. Although in early embryos (E9.5) we could detect clearly hox (i.e. Hox8 and Hox9 in Author response image 1) expression patterns in the neural tube by whole mount in situ hybridization, we failed to detect a clear pattern in the brain stem at E18.5 either in whole mount tissue or on sections. That’s one reason that we turned to single nuclear RNA-seq instead.

      This is likely due to their low expression levels at late developmental stages and need to be detected by more sensitive method. However, we estimated that the stimulated expression levels of the representative Hox genes are at least comparable to the physiological levels at posterior spinal cord to evoke a functional effect.

      Author response image 1.

      Some Hox8 and Hox9 expression pattern in E9.5 embryos.

      (2) In Figure 1, massive upregulation of most Hox genes in the brainstem is shown after e16.5 but the paper quickly focuses on analysis of PN nuclei. What are the other consequences of this broad upregulation of Hox genes in the brainstem? There is no discussion of the overall phenotype of the mice, the structure of the brainstem, the migration of neurons, etc. The very narrow focus on motor cortex projections to PN nuclei seems bizarre without broad characterization of the mice, and the brainstem in particular. There is only a mention of "severe motor deficits" from previous studies, but given the broad expression of Rnf220, the fact that is a global knockout, and the effects on spinal cord populations shown previously the justification for focusing on PN nuclei does not seem strong.

      Thank you for your comment.

      Although RNF220 is important for the dorsal-ventral patterning of the spinal cord as well as the hindbrain during embryonic development, the earlier neural patterning and differentiation are normal in the Rnf220+/- mice (Wang et al., 2022). However, these mice showed reduced survival and motility to various degree postnatally (Ma et al., 2019; Ma et al., 2021), likely suggesting a dosage dependent role of RNF220 in maintaining late neural development. As our microarray assay showed the deregulation of the Hox genes in the brain, we followed this direction in this study and narrowed down the affected region to the pons. Our single nuclear RNA-Seq (snRNA-seq) data further shows that the Hox de-regulation mainly occurred in 3 clusters of neurons. However, the pons is complex and contains tens of nuclei. And the current resolution of our data does not support to assign a clear identity to each of them. Although it is clear that more nuclei are likely affected, the PN (cluster7) is the only cluster we can identify to follow in the current study. 

      As to general effect of RNF220 haploinsufficiency on the brainstem, we carried out Nissl staining assays and found no clear difference in neuronal cell organization between WT and Rnf220+/- pons (revised Figure 2-figure supplement 2).

      (3) It is stated that cluster 7 in scRNA-seq corresponds to the PN nuclei. The modest effect shown on Hox3-5 expression in that data in Figure 1 is inconsistent with the larger effect shown in Figure 2.

      Thank you for your comment.

      Due to the low efficiency of snRNA-seq and the depth of the sequencing, the quantification of the Hox expression based on the snRNA-seq data is likely less accurate as the qRT-PCR. In addition, only mRNAs in the nuclear could be captured by snRNA-seq, while mRNAs in both the nuclear and cytoplasm were reversed-transcribed and examined for qRT-PCR assays in Figure 2A.

      (4) Presumably, Hox genes are not the only targets of Rnf220 as shown in the microarray/RNA-sequencing data. There is no definitive evidence that any phenotypes observed (which are also not clear) are specifically due to Hox upregulation. The only assay the authors use to look at a Hox-dependent phenotype in the brainstem is the targeting of PN nuclei by motor cortex axons. This is only done in 2 animals and there are no details as to how the data was analyzed and quantified. The only 2 images shown are not convincing of a strong phenotype, they could be taken at slightly different levels or angles. At the very least, serial sections should be shown and the experiment repeated in more animals. There is also no discussion of how these phenotypes, if real, would relate to previous work by the Rijli group which showed very precise mechanisms of synaptic specificity in this system.

      Thank you for your comments and suggestions.

      The deregulation of Hox is the most obvious phenomena observed from the RNA-seq data, and we tried to assign its specific phenotypic effect in this study. As the roles of Hox in PN patterning and circuit formation is well established, we focused on the PN in the following study. Based on literature, we carried out the circuit analysis to examine the targeting of PN neurons by the motor cortex axons. A cohort of additional animals with different genotypes (n=10 for WT and n=9 for Rnf220+/-) were used to repeat the experiment and we got the same conclusion. More detailed information on data analysis and serial images were included in the revised manuscript and figure legends.

      (5) The temporal aspect of this regulation in vivo is not clear. The authors show some expression changes begin at e16.5 but are also present at 2 months. Is the presumed effect on neural circuits a result of developmental upregulation at late embryonic stages or does the continuous overexpression in adult mice have additional influence? Are any of the Hox genes upregulated normally expressed in the brainstem, or PN specifically, at 2 months? Why perform single-cell sequencing experiments at 2 months if this is thought to be mostly a developmental effect? Similarly, the significance of the upregulated WRD5 in the pons and pontine nuclei at 2 months in Figure 3 is not clear.

      Thank you for your comment.

      The spatial and temporal expression pattern of Hox genes is established at early embryonic stages and then maintained throughout developmental stage in mammals. As we have shown, the de-repression of Hox genes is a long-lasting defect in Rnf220+/- mice beginning at late embryonic stages. Since the neuronal circuit is established after birth in mice, we speculated that the neuronal circuit defects from motor cortex to PN neurons were due to the long-lasting up-regulation of Hox genes in PN neurons. We could not distinguish the effect on neural circuit a result of Hox genes developmental upregulation or continuous overexpression in adult mice. An inducible knockout mouse model may help to answer this question in the future. The discussion on this point was included in the revised manuscript.

      We carried out snRNA-seq analysis using pons tissues from adult mice aiming to identify the specific cell population with Hox up-regulation, which we failed to specify by in situ hybridization.

      We repeated the related experiments in the original Figure 3 and some of the blot images were replaced and quantified.

      (6) In Figure 3C, the levels of RNF220 in wt and het don't seem to be that different.

      We repeated the experiments and changed the related image in the revised Figure 3C.

      (7) Based on the single-cell experiments, and the PN nuclei focus, the rescue experiments are confusing. If the Rnf220 deletion has a sustained effect for up to 2 months, why do the injections in utero? If the focus is the PN nuclei why look at Hox9 expression and not Hox3-5 which are the only Hox genes upregulated in PN based on sc-sequencing? No rescue of behavior or any phenotype other than Hox expression by qPCR is shown and it is unclear whether upregulation of Hox9 paralogs leads to any defects in the first place. The switch to the Nes-cre driver is not explained. Also, it seems that wdr5 mRNA levels are not so relevant and protein levels should be shown instead (same for rescue experiments in P19 cells).

      Thank you for your comments.

      Since our data suggest that the upregulation of Hox genes expression is a long-lasting effect beginning at the late embryonic stage of E16.5, we conducted the rescue experiments by in utero injection of WDR5 inhibitor at E15.5 and examined the expression of Hox genes at E18.5. Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection is also a long-lasting effect at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. As a supplement, rescue assays with genetic ablation of Wdr5 gene were conducted and the results showed that genetic ablation of a single copy of Wdr5 allele could revere the upregulation of Hox genes by RNF220 haploinsufficiency in the hindbrains at P15.

      Most of the upregulated Hox genes including both Hox9 and Hox3-5 were examined in our rescue experiments. Since this study focuses on the PN nuclei, the results of Hox3-5 genes were shown in the revised main Figure 6.

      We conducted rescue experiments by deleting Wdr5 in neural tissue using Nestin-Cr_e mice because _Wdr5+/- mice is embryonic lethal. And the up-regulation of Hox genes could be also observed in the hindbrains of Rnf220fl/wt; Nestin-Cre mice. Although Rnf220fl/wt; Wdr5fl/wt; Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue of behavior tests was conducted in this study. We believe that it is out of the scope of this study to discuss the role of WDR5 in the development of forebrains.

      The potential defects due to the up-regulation of Hox9 paralogs awaits further investigations.

      Wdr5 mRNA levels were firstly examined to confirm the genetic deletion or siRNA mediated knockdown of Wdr5 genes. We have carried out western blot to examine the WDR5 protein levels and the results were included in the revised Figure 3.

      (8) What is the relationship between Retinoic acid and WRD5? In Figure 3E there is no change in WRD5 levels without RA treatment in Rnf KO but an increase in expression with RA treatment and Rnf KO. However, the levels of WRD5 do not seem to change with RA treatment alone. Does Rnf220 only mediate WDR5 degradation in the presence of RA? This does not seem to be the case in experiments in 293 cells in Figure 4.

      Thank you for your comment.

      We believe that the regulation of WDR5 and Hox expression by RNF220 is context dependent and precisely controlled in vivo, depending on the molecular and epigenetic status of the cell, which is fulfilled by RA treatment in P19 cells. In Figure 4, the experiment is based on exogenous overexpression assays, which might not fully reflect the situation in vivo.

      (9) Why are the levels of Hox upregulation after RA treatment so different in Figure 5 and Figure Supplement 5?

      In Figure.5C, the Hox expression levels were normalized against the control group in the presence of RA; while in Figure Supplement 5 they were normalized to the control group without RA treatment.

      (10) In Figures 4B+C which lanes are input and which are IP? There is no quantitation of Figure 4D, from the blot it does look that there is a reduction in the last 2 columns as well. The band in the WT flag lane seems to have a bubble. Need to quantitate band intensities. Same for E, the effect does not seem to be completely reversed with MG132.

      Thanks for pointing this out. The labels were included in the revised Figure 4B and 4C.

      We repeated the experiments for Figure 4D and 4E. Some of bot images were replaced and quantified in the revised Figure 4D and 4E.

      Reviewer 2:

      (1) Figure 1E shows that Rnf220 knockdown alone could not induce an increase in Hox expression without RA, which indicates that Rnf220 might endogenously upregulate Retinoic acid signaling. The authors should test if RA signaling is downstream of Rnf220 by looking at differences in the expression of Retinaldehyde dehydrogenase genes (as a proxy for RA synthesis) upon Rnf220 knockdown.

      Thank you for your comment and suggestion.

      Two sequential reactions are required for RA synthesis from retinol, which catalyzed by alcohol dehydrogenases (ADHs)/ retinol dehydrogenase (RDH) and retinaldehyde dehydrogenase (RALDHs also known as ALDHs) respectively. When RA is no longer needed, it is catabolized by cytochrome enzymes (CYP26 enzymes) (Niederreither, et al.,2008; Kedishvili et al., 2016). Here, we test ADHs、ALDHs and CYP26 enzymes in E16.5 WT and Rnf220-/- embryos.

      The results are as follows. ADH7 and ADH10 are slightly upregulated. ALDH1 and ALDH3 are upregulated and downregulated in Rnf220-/- embryos, respectively, but there is no significant change in the expression of ALDH2, which plays a key role in RA synthesis during embryonic development (Niederreither, et al.,2008). Furthermore, Cyp26a1 which responsible for RA catabolism was upregulated in Rnf220-/- embryos. Collectively, these data do not support a clear effect on RA signaling by RNF220.  

      Author response image 2.

      The effect of Rnf220 on RA synthesis and degradation pathways

      (2) In Figure 2C-D further explanation is required to describe what criteria were used to segment the tissue into Rostral, middle, and caudal regions. Additionally, it is unclear whether the observed change in axonal projection pattern is caused due to physical deformation and rearrangement of the entire Pons tissue or due to disruption of Hox3-5 expression levels. Labeling of the tissue with DAPI or brightfield image to show the structural differences and similarities between the brain regions of WT and Rnf220 +/- will be helpful.

      Thank you for your comment and suggestion.

      More information on the quantification of the results shown in Figure 2C-D was included in our revised manuscript. We carried out Nissl staining assays using coronal sections of the brainstem and found that there is no significant difference in neuronal cell organization between WT and Rnf220+/- (revised Figure 2-figure supplement 2).

      (3) Line 192-195. These roles of PcG and trxG complexes are inconsistent with their initial descriptions in the text - lines 73-74.

      We are sorry for the mistake. We carefully revised the related descriptions to avoid such mistake. Thank you.

      (4) In Figure 4D, the band in the gel seems unclear and erased. Please provide a different one. These data show that neither Rnf220 nor wdr5 directly regulates Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target. This point should be addressed in the text and discussion section of the paper. example for the same data which shows a full band with lower intensity.

      Thank you for your suggestion.

      We repeated the experiment of Figure 4D and some of the blot images were replaced in the revised Figure 4D.

      Indeed, in the presence of RA, knockdown of Rnf220 alone can upregulate the expression Hox genes (Figure 5C). Knockdown of Wdr5 could reverse the upregulation of Hox genes in RNF220 knockdown cells, suggesting that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (5) In Figure 4G the authors could provide some form of quantitation for changes in ubiquitination levels to make it easier for the reader. They should also describe the experimental procedures and conditions used for each of the pull-down and ubiquitination assays in greater detail in the methods section.

      Thank you for your suggestion.

      The quantitation and statistics for the original Figure 4G were included in the revised Figure 4. More information on the biochemical assays was included in the “Methods and Materials” section of our revised manuscript.

      (6) Figure 5 shows that neither Rnf220 nor wdr5 directly regulate Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target.

      Thank you for your comment.

      In fact, knockdown of Rnf220 alone can upregulate the expression Hox genes in the presence of RA (Figure 5C). Furthermore, knockdown of Wdr5 could reverse the upregulation of Hox genes in Rnf220 knockdown cells, which suggest that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (7) In Figure 6, while the reversal of changes in Hox gene expression upon concurrent Rnf220; Wdr5 inhibition highlights the importance of Wdr5 in this regulatory process, the mechanistic role of wdr5 and its functional consequences are unclear. To answer these questions, the authors need to: (i) Assay for activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 similar to that shown in Figure 3- supplement 1. This will reveal if wdr5 functions according to its intended role as part of the TrxG complex. (ii) The authors need to assay for changes in axon projection patterns in the double knockdown condition to see if Wdr5 inhibition rescues the neural circuit defects in Rnf220 +/- mice.<br />

      Thank you for your suggestion.

      Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection in uetro is also a long-lasting effect for neuronal cirtuit at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. Although Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue effect on defects of behavior and neuronal circuit were examined in this study. Maybe, a PN nuclei specific inducible Cre mouse line could help toward this direction in the future.

      We carried out ChIP-qPCR and tested activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 in P19 cell line and found Rnf220 and Wdr5 double knockdown recured Hox epigenetic modification to a certain degree (Figure 6-figure supplement 1).

      References

      Kedishvili, N.Y. 2016. Retinoic acid synthesis and degradation. Subcell Biochem, 81:127-161. DOI: 10.1007/978-94-024-0945-1_5, PMID: 2783050

      Ma, P., Li, Y., Wang, H., Mao, B., Luo, Z.-G. 2021. Haploinsufficiency of the TDP43 ubiquitin E3 ligase RNF220 leads to ALS-like motor neuron defects in the mouse. Journal of Molecular Cell Biology, 13: 374-382. DOI: 10.1093/jmcb/mjaa072, PMID: 33386850

      Ma, P., Song, N.-N., Li, Y., Zhang, Q., Zhang, L., Zhang, L., Kong, Q., Ma, L., Yang, X., Ren, B., Li, C., Zhao, X., Li, Y., Xu, Y., Gao, X., Ding, Y.-Q., Mao, B. 2019. Fine-Tuning of Shh/Gli Signaling Gradient by Non-proteolytic Ubiquitination during Neural Patterning. Cell Rep, 28: 541-553.e544. DOI: 10.1016/j.celrep.2019.06.017, PMID: 31291587

      Niederreither, K., Dollé, P. 2008. Retinoic acid in development: towards an integrated view. Nat Rev Genet, 9: 541-53. DOI: 10.1038/nrg2340, PMID: 18542081

      Wang, Y.-B., Song, N.-N., Zhang, L., Ma, P., Chen, J.-Y., Huang, Y., Hu, L., Mao, B., Ding, Y.-Q. 2022. Rnf220 is Implicated in the Dorsoventral Patterning of the Hindbrain Neural Tube in Mice. Front Cell Dev Biol, 10. DOI: 10.3389/fcell.2022.831365, PMID: 35399523

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction, clonal expansion differences, and tumor expression differences between responders and non-responders. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort.

      Strengths:

      The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain the heterogeneity of patient response and be able to predict it.

      Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state-of-the-art methods.

      The authors provide an interesting scRNAseq data set linked to outcomes data.

      Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis.

      Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof.

      Weaknesses:

      Generally, a very heterogeneous and small cohort where adjustments for confounding are hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments would negate signal and confirmation bias likely, so biological takeaways have to be questioned.

      Thank you for your comment. We made multiple testing adjustments as suggested in “Recommendations for Authors.”

      RNAseq is heavily influenced by the tissue of origin (both cell type and expression), so the association with the outcome can be confounded. The authors try to argue that lymph node T-cell and NK content are similar, but a quantitative test on that would be helpful.

      Following the reviewer’s suggestion, we performed principal component analysis (PCA) to assess the influence of tissue of origin on immune and stromal cell populations. In the revised Figure S1g, we quantified the similarity using Euclidean distances of centroids between sample groups based on their tissue of origin in the PC1 and PC3 plot.

      The authors claim a very high "accuracy" performance, however, given the small cohort and lack of information on the exact evaluation it is not clear if this just amounts to overfitting the data.

      We acknowledge the concern about the high “accuracy” potentially indicating overfitting. To address this, we revised the manuscript to clarify the use of 'accuracy,' 'AUC,' and 'performance' with clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Especially for tumor cell program/state analysis the specificity to the setting of ICIs is not clear and could be prognostic.

      Thank you for your comments. As outlined in the ‘Table 2 in the revised manuscript’, we conducted a multivariate survival analysis of tumor signature candidates using the TCGA lung adenocarcinoma (LUAD, n = 533) and squamous cell carcinoma (LUSC, n = 502) cohorts to evaluate their prognostic potential. No tumor cell programs or states were found to be associated with overall survival in either LUAD or LUSC. We added descriptions related to Table 2 in the Results (Lines 249-251) and Methods (Lines 530-542) section.

      Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.

      Expanding the cohort size was difficult due to limited resources. We recognize the challenges posed by the small and heterogeneous cohort. We have acknowledged these limitations and applied statistical corrections to address them.

      Reviewer #2 (Public Review):

      Summary:

      The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.

      Strengths:

      The main strengths of this work lie in the methodology of integrating single-cell sequencing, genetic data, and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.

      Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3, and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.

      Weaknesses:

      Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate the robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts, no functional characterization of the findings, and the discussion section does not include discussion around the relevance/interpretation of key findings that were highlighted in the abstract (eg. role of Th17, TRM, STAT3, and NFKb). Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.

      We acknowledge the challenges posed by the small and heterogeneous cohort. To address this, we tempered our claims related to accuracy by applying statistical testing corrections. We also appreciate the feedback on functional characterization and have expanded the discussion in the revised manuscript to include an overview of specific cell populations and genes.

      Related to the absence of discussion around prior TRM findings, the association between TRM involvement in response to IO therapy in this manuscript is counter to what has been previously demonstrated (Cell Rep Med. 2020;1(7):100127, Nat Immunol. 2017;18(8):940-950., J Immunol. 2015;194(7):3475-3486.). However, it should be noted that the authors in this manuscript chose to employ alternative markers of TRM characterisation when defining their clusters and this could indicate a potential rationale for differences in these findings. TRM population is generally characterised through the inclusion of the classical TRM markers CD69 (tissue retention marker) and CD103 (TCR experienced integrin that supports epithelial adhesion), which are both absent from the TRM definition in this study. Additional markers often used are CD44, CXCR6, and CD49a, of which only CXCR6 has been included by the authors. Conversely, the majority of markers used by the authors in the cell type clustering are not specific to TRM (eg. CD6, which is included in the TRM cluster but is expressed at its lowest in cluster 3 which the authors have highlighted as the CD8+ TRM population). Therefore, whilst there is an interesting finding of this particular cell cluster being associated with resistance to ICI, its annotation as a TRM cluster should be interpreted with caution.

      Single-cell RNA sequencing (scRNA-seq) can sometimes fail to detect the expression of classical cell type markers due to incomplete capture of a cell’s transcriptome. To determine cell identity, we utilized cell type markers established in previous scRNA-seq studies. In response to your comments, we have added the expression levels of classical TRM markers, including CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Although these markers were not exclusively expressed in TRM clusters, TRM clusters exhibited relatively high levels of these genes while lacking other clusters’ specific marker genes.

      Reviewer #1 (Recommendations For The Authors):

      General suggestions:

      When analyzing the association of cell type proportions with outcomes, some adjustment for multiple testing should be considered (either sampling-based, e.g. permutation test, or adjustment based on assumptions of independence of tests, e.g. Bonferroni).

      Thank you for your comments. As suggested, we calculated the adjusted p-value using the False Discovery Rate for the association of cell type proportions with outcomes in Figure 3a. The heatmap in Reviewer's ONLY Figure 1, using the adjusted p-value consistently showed the expected grouping of cell types and outcomes. However, the significance did not meet the conventional statistical cutoff criteria. We acknowledge this limitation, which results from statistical testing based on ratio values.

      Author response image 1.

      Heat map with unsupervised hierarchical clustering of proportional changes in cell subtypes within total immune cells. Proportional changes were compared across multiple ICI response groups. The color represents the adjusted -log (p-value) calculated using the False Discovery Rate.

      A formal test of clonotype differences (normalized to cell type fraction) would be great as the shown plot 2e could be confounded by cell number and type differences between responders and non-responders.

      Thank you for your suggestion. We have revised Figure 2e to display the relative clonotype differences versus CD4+ and CD8+ T cell fractions in each sample. The relative clone size of each cell was calculated by dividing the size of each clone by the total number of CD4+ or CD8+ T cells, respectively.

      It could be made a bit more clear when the core group of patients was used (only when associating with outcomes?) and when all other patients were used as well (only cell type annotation?).

      As the reviewer correctly noted, we performed scRNA-seq analysis on all specimens, but only the core group of patients was used for the comparative analysis between the responder and non-responder groups. This information has been detailed in the manuscript (Lines 103-105).

      For immune cells, it would be interesting to look at expression patterns (NMF, scINSIGHT) as well, not just immune cell fractions and expansion.

      In contrast to tumor signatures, immune cell programs are more directly tied to their functional characteristics. Therefore, we focused on annotating immune cells based on their functional properties and conducted comparative analyses between responders and non-responders.

      Multiple testing is necessary for the univariate association analysis. Some adjustments for confounders in a multivariate model (despite the size) could be informative.

      As shown in ‘Reviewer's ONLY Table 1’, we conducted a multivariate regression analysis of immune and tumor signatures for ICI response, adjusting for clinical variables such as tissue origin, cancer subtype, pathological stage, and smoking status. However, the results were not significant, likely due to the heterogeneity and small size of the cohort.

      Author response table 1.

      P-values from univariate and multivariate regression analysis of immune and tumor signatures for ICI response.

      It is not clear from the manuscript how "accuracy" is measured. The terms "accuracy" and "AUC", as well as "performance" are used interchangeably, a section in the methods with the precise definition is needed.

      We have revised the manuscript to clarify the terms 'accuracy,' 'AUC,' and 'performance' by using clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Furthermore, it has to be clear if this is in-sample performance or if there was some train/test split or cross-validation used. Given the small cohort size and wealth of features finding some combination of predictors that could overfit on responders/non-responders would not be surprising.

      As the reviewer has noted, we acknowledge the statistical limitations due to the small cohort size. We have revised the sentence on Lines 545-547 “Classification models of responders and non-responders for PC signatures and combinatorial indexes between tumor and/or immune cells were generated based on in-sample performance…”.

      Suggestions to improve readability:

      Line 84: The sentence should be reformulated to improve understanding.

      We have revised sentences in lines 81-93.

      Line 86: missing a "the".

      We have revised the sentences in lines 81-93.

      Reviewer #2 (Recommendations For The Authors):

      "Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells" Please look to rephrase this sentence as this is not entirely accurate: PD-1 is upregulated in tumor-experienced T cells as a consequence of antigen recognition ie those cells that recognise tumor will increase PD-1, whereas the sentence as it's currently written indicates that PD1+ cells have an intrinsically increased capacity to kill tumors, which is incorrect.

      We have revised the sentence “Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells” in lines 86-88 as “More specifically, PD-1 expression is upregulated upon antigen recognition (PMID29296515), indicating that certain T cells in the tumor microenvironment are actively engaged as tumor-specific T cells.” in the revised manuscript.

      Cancer subtype abbreviations (eg. SQ, ADC, NUT) are used in figures in the main article and so should be defined in the main text (they are currently only explained in the legend for the supplementary table).

      As per the reviewer’s suggestion, the manuscript has been revised to include definitions of cancer type abbreviations in lines 108-110.

      Figure S1d-f does not appear to corroborate the statement that "Although there were differences in tissue-specific resident populations, we found that the immune cell profiles, especially T/NK cells of mLN were similar to those of primary tumor tissues indicating the activation of immune responses were 118 consistently observed at metastatic sites (Figure S1d-f)." The diagrams are complex (please explain all abbreviations) and it is not clear how the authors have come to this conclusion. Additionally, cell quantity does not indicate that the 'activation of immune responses' is consistently observed at metastatic sites as these cells could be dysfunctional/bystander.

      In the revision, we have quantified the diagrams (Figure S1f) to more clearly highlight the differences in tissue-specific resident populations. We performed principal component analysis (PCA) to evaluate the impact of tissue origin on immune and stromal cell populations. In the revised Figure S1g, we illustrated the quantitative similarity between sample groups using Euclidean distances in the PC plot based on their tissue of origin. Additionally, the legends for Figures S1d and S1e have been updated to include definitions for all abbreviations.

      We agree with the reviewer's comment that cell quantity alone may not fully reflect activation of antigen-specific immune responses, even though we annotated the functional T cell subtypes. To better focus on the comparisons of cellular profiles between metastatic sites (mLN) and primary tumors (tLung and tL/B), we removed the sentence “…indicating the activation of immune responses were consistently observed at metastatic sites (Fig. S1d-f).” from the revised manuscript.

      In Figure 2c, classical markers for TRM (CD103, CD69) should be included in the description for the definition of the TRM clusters, or their exclusion appropriately explained. The findings regarding the negative correlation between follicular B cells and ICI response are surprising. Figure S3, the cluster identified as Follicular B cells contains MS4A1 (CD20) and HLA-DRA. Classical markers are CD20 (pan-B cell), CD21 (CR2), CD23, and IgD/IgM (double positive), and as such it is not clear if the authors have appropriately annotated this cluster as representing follicular B cells. These classical markers should be included in the interpretation of the cell clustering or their exclusion appropriately explained.

      We appreciate your comments. In response, we have added the expression levels of classical TRM markers such as CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Additionally, we revised the dot plot showing the mean expression of marker genes in each cell cluster for B/Plasma cells (revised Figure S3b) by incorporating classical markers for Follicular B cells, such as CD21 (CR2), CD23 (FCER2), IgD (IGHD), IgM (IGHM).

      Figure 2f is rather confusing for the reader. I would recommend changing to an alternative plot that shows logP and response in a different way. If keeping to this plot type please clarify why plotting response vs PD, and whether the lower left quadrant indicates patients with progressive disease and the top right indicates responders as the interpretation is not clear currently.

      Thank you for your feedback. To address the concerns raised, we have updated the figure legend for Figure 2f to clarify the interpretation of the quadrants: “The lower left quadrant shows cell types overrepresented in the poor responder groups, while the upper right quadrant indicates cell types overrepresented in the better responder groups”. This clarification aims to help readers understand that the lower left quadrant reflects cell types associated with worse treatment outcomes, while the upper right quadrant reflects cell types associated with improved therapeutic responses.

      The terms "PC7.neg, INT.down, and UNION.down" are included in the results with no explanation to the reader of what they are or how to interpret them. The methods description "We constructed DEGs with 470 intersections (INT) and union (UNION) of up- or down-regulated genes for comparisons" does not sufficiently describe how they were generated/calculated and, therefore, this is difficult for the reader to interpret in the final results section. Please add an additional explanation for the reader in the final section of the results/Figure 5 and in the methods.

      Following the reviewer’s suggestion, we added additional explanation in the Results section (lines 258-261): “PC7.neg denotes genes negatively correlated with PC7, a principal component extracted from PCA that distinguishes tumor cells in poor response groups. INT.down and UNION.down represent the intersection and union of down-regulated genes in the responder group, respectively.”. We also explained the details in the Methods section (lines 489-495): “We reconstructed DEGs as four groups: INT.up, INT.down, UNION,up, and UNION.down, based on with the intersection (INT) and union (UNION) of up- or down-regulated genes for pairwise comparisons between responder versus non-responder, PR versus PD, and PR versus SD. INT.up and INT.down represent the intersection of up- and down-regulated genes in the responder group, respectively. UNION.up and UNION.down represent the union of up- and down-regulated genes in the responder group, respectively.”

      The TRM and Th17+ T cell populations are highlighted in the abstract as being related to ICI resistance, but these populations of cells are not even mentioned in the discussion. Likewise, STAT3 and NFkb pathways are also highlighted in the abstract but absent in the discussion section. Please discuss the relevance of these findings, particularly given the prior studies demonstrating the opposite impact of TRM populations in NSCLC.

      We have expanded the discussion in the revised manuscript (Lines 295-313) to address the roles of TRM and Th17+ T cell, as well as the STAT3 and NF-κB pathways, in association with ICI resistance in NSCLC.

      “The identification of an abundance of CD4+ TRM cells as a negative predictor of ICI response is an unexpected finding, considering that higher frequencies of TRM cells in lung tumor tissues are generally associated with better clinical outcomes in NSCLC (PMID28628092). This is largely due to their role in sustaining high densities of tumor-infiltrating lymphocytes and promoting anti-tumor responses. Additionally, previous studies have demonstrated that TRM cell subsets coexpressing PD-1 and TIM-3 are relatively enriched in patients who respond to PD-1 inhibitors (PMID31227543). However, recent findings suggest that pre-existing TRM-like cells in lung cancer may promote immune evasion mechanisms, contributing to resistance to immune checkpoint blockade therapies (PMID37086716). These observations suggest that the roles of TRM subsets in tumor immunity are highly context-dependent.

      Similarly, CD4+ TH17 cells, which were overrepresented in the non-responder groups, exhibit context-dependent roles in tumor immunity and may be associated with both unfavorable and favorable outcomes (PMID34733609; PMID30941641). In exploring tumor cell signatures linked to ICI response, non-responder attributes were regulated by STAT3 and NFKB1. The STAT3 and NF-κB pathways are crucial for Th17 cell differentiation and T cell activation (PMID24605076; PMID32697822). Notably, STAT3 activation in lung cancer orchestrates immunosuppressive characteristics by inhibiting T-cell mediated cytotoxicity (PMID31848193). The combined influence of the Th17/STAT3 axis and TRM cell activity in predicting ICI response underscores the complexity of these pathways and suggests that their roles in tumor immunity and therapy response warrants further investigation.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    1. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    1. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. Author response:

      Reviewer #1 (Public Review): 

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      We thank the reviewer for finding our study “interesting”, “important”, and “conclusions are largely supported by data”.

      (1)  The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using a combination of the Tet-Off system and chemogenetic/optogenetic tools. 

      We agree with the reviewer that activating the stress-“trapped” neurons will be more specific way to induce SIA through septal activation, compared to the activation of entire dLS strategy pursued by us. In most likelihood, we expect to see a robust SIA if specifically stress responsive dLS neurons are observed. We are in the process of acquiring the genetic tools required for “Trapping” stress neurons and expect to be able to perform the experiments suggested by the reviewers in the coming months. 

      (2)  Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia, SIH). Is the circuit in the manuscript involved in SIH?

      As mentioned by the reviewer, it would be reasonable to suspect that the dLS neurons are involved in SIH. However, we believe that the experiments to test this hypothesis is outside the scope of this paper, since here we have focused on the circuit mechanisms for SIA. However, in the revised discussion section, we have included the possibility of dLS neurons driving SIH. 

      (3)  It is well-accepted that opioid and cannabinoid receptors participate in the SIA, and the evidence is especially strong for the RVM endocannabinoid system. Given this, why did the authors focus their study on the opioid system?

      We agree with the reviewer that dLS-mediated SIA may work through neural circuits centered on RVM expressing receptors for either or both opioids and endocannabinoids. We primarily focused on the opioidergic system in the RVM as decades of mechanistic work has revealed how the ON, OFF, and neutral neurons modulate pain through the endogenous opioids and even mediate SIA. In the revised discussion, we have included the possibility of involvement of both pain modulatory systems. 

      (4)  Does silencing of the dLS neurons affect stress-induced anxiety-like behaviors? Alternatively, what is the relationship between SIA and the level of stress-induced anxiety?

      We did not test if the silencing of dLS would affect stress-induced anxiety, as our focus was on the pain modulatory effects of dLS activation. The relationships between levels of SIA and stress-induced anxiety will be interesting to explore in future. We believe we would need better behavioral assays compared to the existing ones to quantitatively measure levels of stress-induced anxiety and SIA levels.

      (5)  Direct electrophysiological evidence should be provided to confirm the efficacy of the MP-CNO.

      We agree with the reviewer that ex-vivo electrophysiology experiments will substantiate the effectiveness of the MP-CNO. However, we do not have the expertise, or the instrumentation required to perform these experiments in our laboratory.

      (6)  Is the LHA a specific downstream target for SIA, and is the LHA involved in stressinduced anxiety-like behaviors?

      Several lines of evidence points to the fact that LHA neurons are involved in stressinduced anxiety. We have also shown that the dLS downstream neurons in the LHA are activated by acute restraint by fiber photometry recordings. Thus, we expect activation of the LHA neurons will cause stress-induced anxiety. However, we wanted to focus on the pain modulation aspect of the dLS-LHA-RVM circuitry.

      (7)  Do LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

      Our anatomical studies using transsynaptic anterograde and retrograde viral strategies in the Figure 6 shows that the LHA neurons have direct projections to the RVM, and these neurons are sufficient in driving hyperalgesia, as well as necessary for SIA. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Shah et al. explore the function of an understudied neural circuitry from the dLS -> LHA -> RVM in mediating stress-induced analgesia. They initially establish this neural circuitry through a series of intersectional tracings. Subsequently, they conduct behavioral tests, coupled with optogenetic or chemogenetic manipulations, to confirm the involvement of this pathway in promoting analgesia. Additionally, fiber photometry experiments are employed to investigate the activity of each brain region in response to stress and pain. 

      Strengths: 

      Overall, the study is comprehensive, and the findings are compelling. 

      We appreciate the reviewer for finding our manuscript “comprehensive” and “compelling”.

      Weaknesses: 

      One noteworthy concern arises regarding the overarching hypothesis that restrainedinduced stress promotes analgesia. A more direct interpretation suggests that intense struggling, rather than stress per se, activates the dLS -> LHA -> RVM pathway that may drive analgesic responses. 

      We agree with the reviewer that our data can be interpreted as “intense struggling”, rather than the “acute stress” might have altered the pain thresholds in mice. However, we would like to point out that the restraint induced stress model that we have used has been long regarded as a standard for inducing stress. Moreover, we have demonstrated that dLS activation results into acute stress by measuring the blood corticosterone levels, and showed that dLS activations caused stress-induced anxiety through lightdark box tests.

      Reviewer #2 (Recommendations For The Authors): 

      Please find below my other comments for improvements. 

      Introduction: The authors claimed that "dLS neurons receive nociceptive inputs from the thalamus and somatosensory cortices." However, citations are missing.

      We have added the citations.

      Figure 1 B&C: Although this paper focuses on the dLS, it would be informative to also include vLS c-Fos images (maybe in a supplementary figure), given that these data appear to be already acquired. The inclusion of vLS data will provide critical information regarding potential specificity (or lack of) across LS subregions in stress responses.

      In the revised manuscript we have added the vLS c-Fos images as suggested by the reviewer. 

      Figure 1D: Quantification of Vgat vs. Vglut neurons is missing. It is unclear if the Vgat neurons are restricted to small clusters.

      We did not add the Vglut vs, Vgat quantification since from both of our experiments and publicly available data from the Allen Brain Atlas show that almost all of the neurons in the LS are gabaergic. We found very rare,0-2 Vglut2 expressing neurons per section in the the LS of the mouse brain.

      Figure 1G: The Y-axis label is missing. 

      We have added the axis in the revised manuscript.

      Figure 2: The authors claimed that dLS neurons are preferentially tuned to stress caused by physical restraint. However, it appears that these neurons are specifically tuned to intense struggle behavior (transient) rather than stress (prolonged).

      We agree with the reviewer that the SIA observed in mice with dLS activation, can be interpreted as the effect of transient struggle behavior rather than the prolonged stress. However, we would like to point out that the acute restraint for one hour is known to produce prolonged stress, and is backed up by increased blood coticosterone levels and stress-induced anxiety (Fig1-Fig Supplementary 1).

      Figure 4: The authors provided compelling evidence that dLS neurons synapse on LHA Vglut2 neurons. However, it is unclear if they exclusively target the Vglut2 neurons or also synapse on LHA Vgat neurons.

      We agree with the reviewer that even though the majority of the dLS downstream neurons in the LHA are glutamatergic, as now shown in the Fig. 4D, few neurons do not express Vglut and thus must be Gabaergic. 

      Figure 5D: It is unclear if the trace represents dLS or LHA calcium signal (in the main text, the authors claimed both).

      Now, we have mentioned the neurons on the LHA we have recorded from at the top of Figure 5C, D. 

      Figure 6 G&H: Presumably, ΔG-Rabies does not transmit across neurons due to the deletion of the glycoprotein (G) gene. Thus, it is unclear why dLS and LHA neurons express mCherry after injecting rabies into RVM.

      The aim of the rabies experiment was to test that the cells in the LHA that receive inputs from the dLS are the same ones that send projections downstream to the RVM. To this end, we used a monosynaptic rabies virus that has retrograde properties. Hence, when injected into the RVM, it was taken up by the terminals of the LHA neurons in the RVM and traveled to the cell bodies in the LHA. We injected the AAV1-Transsyn-Cre in the dLS, so only the cells downstream of the dLS in the LHA can express the Credependent glycoprotein (G) gene. Thus, the rabies-mCherry virus infected the LHA neurons downstream of dLS specifically, and jumped a synapse, to label the upstream dLS neurons.

      The authors claim that "RVMpost-LHA neurons may modulate nociceptive thresholds through their local synaptic connections within the RVM, recurrent connections with the PAG, or direct interactions with spinal cord neurons." It is unclear what the "local synaptic connections within the RVM" means. It is also unclear whether there is evidence of recurrent connections between the RVM and PAG.

      We meant by local connections as intrinsic connections within the RVM, as in some or few of the RVM neurons, post LHA might be interneurons and mediating SIA by modulating the ON or OFF cells. There are some anatomical evidence for the ascending inputs from RVM to the PAG and the we have now included the citation in the mentioned section of the manuscript.